Class TokenStreamToAutomaton


  • public class TokenStreamToAutomaton
    extends Object
    Consumes a TokenStream and creates an Automaton where the transition labels are UTF8 bytes (or Unicode code points if unicodeArcs is true) from the TermToBytesRefAttribute. Between tokens we insert POS_SEP and for holes we insert HOLE.
    WARNING: This API is experimental and might change in incompatible ways in the next release.
    • Field Detail

      • POS_SEP

        public static final int POS_SEP
        We create transition between two adjacent tokens.
        See Also:
        Constant Field Values
      • HOLE

        public static final int HOLE
        We add this arc to represent a hole.
        See Also:
        Constant Field Values
    • Constructor Detail

      • TokenStreamToAutomaton

        public TokenStreamToAutomaton()
        Sole constructor.
    • Method Detail

      • setPreservePositionIncrements

        public void setPreservePositionIncrements​(boolean enablePositionIncrements)
        Whether to generate holes in the automaton for missing positions, true by default.
      • setFinalOffsetGapAsHole

        public void setFinalOffsetGapAsHole​(boolean finalOffsetGapAsHole)
        If true, any final offset gaps will result in adding a position hole.
      • setUnicodeArcs

        public void setUnicodeArcs​(boolean unicodeArcs)
        Whether to make transition labels Unicode code points instead of UTF8 bytes, false by default
      • changeToken

        protected BytesRef changeToken​(BytesRef in)
        Subclass and implement this if you need to change the token (such as escaping certain bytes) before it's turned into a graph.