org.apache.lucene.analysis
public class TokenStreamToAutomaton extends Object
Automaton
where the transition labels are UTF8 bytes (or Unicode
code points if unicodeArcs is true) from the TermToBytesRefAttribute
. Between tokens we insert
POS_SEP and for holes we insert HOLE.Modifier and Type | Field and Description |
---|---|
static int |
HOLE
We add this arc to represent a hole.
|
static int |
POS_SEP
We create transition between two adjacent tokens.
|
Constructor and Description |
---|
TokenStreamToAutomaton()
Sole constructor.
|
Modifier and Type | Method and Description |
---|---|
protected BytesRef |
changeToken(BytesRef in)
Subclass & implement this if you need to change the
token (such as escaping certain bytes) before it's
turned into a graph.
|
void |
setPreservePositionIncrements(boolean enablePositionIncrements)
Whether to generate holes in the automaton for missing positions,
true by default. |
void |
setUnicodeArcs(boolean unicodeArcs)
Whether to make transition labels Unicode code points instead of UTF8 bytes,
false by default |
Automaton |
toAutomaton(TokenStream in)
Pulls the graph (including
PositionLengthAttribute ) from the provided TokenStream , and creates the corresponding
automaton where arcs are bytes (or Unicode code points
if unicodeArcs = true) from each term. |
public static final int POS_SEP
public static final int HOLE
public void setPreservePositionIncrements(boolean enablePositionIncrements)
true
by default.public void setUnicodeArcs(boolean unicodeArcs)
false
by defaultprotected BytesRef changeToken(BytesRef in)
public Automaton toAutomaton(TokenStream in) throws IOException
PositionLengthAttribute
) from the provided TokenStream
, and creates the corresponding
automaton where arcs are bytes (or Unicode code points
if unicodeArcs = true) from each term.IOException
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.