Class ConcatenateGraphFilter

java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.miscellaneous.ConcatenateGraphFilter
All Implemented Interfaces:
Closeable, AutoCloseable

public final class ConcatenateGraphFilter extends TokenStream
Concatenates/Joins every incoming token with a separator into one output token for every path through the token stream (which is a graph). In simple cases this yields one token, but in the presence of any tokens with a zero positionIncrmeent (e.g. synonyms) it will be more. This filter uses the token bytes, position increment, and position length of the incoming stream. Other attributes are not used or manipulated.
WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Field Details

    • SEP_LABEL

      public static final int SEP_LABEL
      Represents the default separator between tokens.
      See Also:
    • DEFAULT_MAX_GRAPH_EXPANSIONS

      public static final int DEFAULT_MAX_GRAPH_EXPANSIONS
      See Also:
    • DEFAULT_TOKEN_SEPARATOR

      public static final Character DEFAULT_TOKEN_SEPARATOR
    • DEFAULT_PRESERVE_SEP

      public static final boolean DEFAULT_PRESERVE_SEP
      See Also:
    • DEFAULT_PRESERVE_POSITION_INCREMENTS

      public static final boolean DEFAULT_PRESERVE_POSITION_INCREMENTS
      See Also:
  • Constructor Details

    • ConcatenateGraphFilter

      public ConcatenateGraphFilter(TokenStream inputTokenStream)
      Creates a token stream to convert input to a token stream of accepted strings by its token stream graph.

      This constructor uses the default settings of the constants in this class.

    • ConcatenateGraphFilter

      public ConcatenateGraphFilter(TokenStream inputTokenStream, Character tokenSeparator, boolean preservePositionIncrements, int maxGraphExpansions)
      Creates a token stream to convert input to a token stream of accepted strings by its token stream graph.
      Parameters:
      inputTokenStream - The input/incoming TokenStream
      tokenSeparator - Separator to use for concatenation. Can be null, in this case tokens will be concatenated without any separators.
      preservePositionIncrements - Whether to add an empty token for missing positions. The effect is a consecutive SEP_LABEL. When false, it's as if there were no missing positions (we pretend the surrounding tokens were adjacent).
      maxGraphExpansions - If the tokenStream graph has more than this many possible paths through, then we'll throw TooComplexToDeterminizeException to preserve the stability and memory of the machine.
      Throws:
      TooComplexToDeterminizeException - if the tokenStream graph has more than maxGraphExpansions expansions
    • ConcatenateGraphFilter

      public ConcatenateGraphFilter(TokenStream inputTokenStream, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions)
      Parameters:
      preserveSep - Whether SEP_LABEL should separate the input tokens in the concatenated token
  • Method Details