org.apache.lucene.analysis
Class TokenStreamToAutomaton

java.lang.Object
  extended by org.apache.lucene.analysis.TokenStreamToAutomaton

public class TokenStreamToAutomaton
extends Object

Consumes a TokenStream and creates an Automaton where the transition labels are UTF8 bytes (or Unicode code points if unicodeArcs is true) from the TermToBytesRefAttribute. Between tokens we insert POS_SEP and for holes we insert HOLE.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary
static int HOLE
          We add this arc to represent a hole.
static int POS_SEP
          We create transition between two adjacent tokens.
 
Constructor Summary
TokenStreamToAutomaton()
          Sole constructor.
 
Method Summary
protected  BytesRef changeToken(BytesRef in)
          Subclass & implement this if you need to change the token (such as escaping certain bytes) before it's turned into a graph.
 void setPreservePositionIncrements(boolean enablePositionIncrements)
          Whether to generate holes in the automaton for missing positions, true by default.
 void setUnicodeArcs(boolean unicodeArcs)
          Whether to make transition labels Unicode code points instead of UTF8 bytes, false by default
 Automaton toAutomaton(TokenStream in)
          Pulls the graph (including PositionLengthAttribute) from the provided TokenStream, and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

POS_SEP

public static final int POS_SEP
We create transition between two adjacent tokens.

See Also:
Constant Field Values

HOLE

public static final int HOLE
We add this arc to represent a hole.

See Also:
Constant Field Values
Constructor Detail

TokenStreamToAutomaton

public TokenStreamToAutomaton()
Sole constructor.

Method Detail

setPreservePositionIncrements

public void setPreservePositionIncrements(boolean enablePositionIncrements)
Whether to generate holes in the automaton for missing positions, true by default.


setUnicodeArcs

public void setUnicodeArcs(boolean unicodeArcs)
Whether to make transition labels Unicode code points instead of UTF8 bytes, false by default


changeToken

protected BytesRef changeToken(BytesRef in)
Subclass & implement this if you need to change the token (such as escaping certain bytes) before it's turned into a graph.


toAutomaton

public Automaton toAutomaton(TokenStream in)
                      throws IOException
Pulls the graph (including PositionLengthAttribute) from the provided TokenStream, and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term.

Throws:
IOException


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.