public class TokenStreamToAutomaton extends Object
Automaton
where the transition labels are UTF8 bytes (or Unicode
code points if unicodeArcs is true) from the TermToBytesRefAttribute
. Between tokens we insert
POS_SEP and for holes we insert HOLE.Modifier and Type | Field and Description |
---|---|
static int |
HOLE
We add this arc to represent a hole.
|
static int |
POS_SEP
We create transition between two adjacent tokens.
|
Constructor and Description |
---|
TokenStreamToAutomaton()
Sole constructor.
|
Modifier and Type | Method and Description |
---|---|
protected BytesRef |
changeToken(BytesRef in)
Subclass and implement this if you need to change the
token (such as escaping certain bytes) before it's
turned into a graph.
|
void |
setFinalOffsetGapAsHole(boolean finalOffsetGapAsHole)
If true, any final offset gaps will result in adding a position hole.
|
void |
setPreservePositionIncrements(boolean enablePositionIncrements)
Whether to generate holes in the automaton for missing positions,
true by default. |
void |
setUnicodeArcs(boolean unicodeArcs)
Whether to make transition labels Unicode code points instead of UTF8 bytes,
false by default |
Automaton |
toAutomaton(TokenStream in)
Pulls the graph (including
PositionLengthAttribute ) from the provided TokenStream , and creates the corresponding
automaton where arcs are bytes (or Unicode code points
if unicodeArcs = true) from each term. |
public static final int POS_SEP
public static final int HOLE
public void setPreservePositionIncrements(boolean enablePositionIncrements)
true
by default.public void setFinalOffsetGapAsHole(boolean finalOffsetGapAsHole)
public void setUnicodeArcs(boolean unicodeArcs)
false
by defaultprotected BytesRef changeToken(BytesRef in)
public Automaton toAutomaton(TokenStream in) throws IOException
PositionLengthAttribute
) from the provided TokenStream
, and creates the corresponding
automaton where arcs are bytes (or Unicode code points
if unicodeArcs = true) from each term.IOException
Copyright © 2000-2017 Apache Software Foundation. All Rights Reserved.