Class ConcatenateGraphFilter
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.miscellaneous.ConcatenateGraphFilter
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class ConcatenateGraphFilter extends TokenStream
Concatenates/Joins every incoming token with a separator into one output token for every path through the token stream (which is a graph). In simple cases this yields one token, but in the presence of any tokens with a zero positionIncrmeent (e.g. synonyms) it will be more. This filter uses the token bytes, position increment, and position length of the incoming stream. Other attributes are not used or manipulated.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
ConcatenateGraphFilter.BytesRefBuilderTermAttribute
Attribute providing access to the term builder and UTF-16 conversionstatic class
ConcatenateGraphFilter.BytesRefBuilderTermAttributeImpl
Implementation ofConcatenateGraphFilter.BytesRefBuilderTermAttribute
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_MAX_GRAPH_EXPANSIONS
static boolean
DEFAULT_PRESERVE_POSITION_INCREMENTS
static boolean
DEFAULT_PRESERVE_SEP
static Character
DEFAULT_TOKEN_SEPARATOR
static int
SEP_LABEL
Represents the default separator between tokens.-
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
-
Constructor Summary
Constructors Constructor Description ConcatenateGraphFilter(TokenStream inputTokenStream)
Creates a token stream to convertinput
to a token stream of accepted strings by its token stream graph.ConcatenateGraphFilter(TokenStream inputTokenStream, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions)
ConcatenateGraphFilter(TokenStream inputTokenStream, Character tokenSeparator, boolean preservePositionIncrements, int maxGraphExpansions)
Creates a token stream to convertinput
to a token stream of accepted strings by its token stream graph.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
void
end()
boolean
incrementToken()
void
reset()
Automaton
toAutomaton()
Converts the tokenStream to an automaton, treating the transition labels as utf-8.Automaton
toAutomaton(boolean unicodeAware)
Converts the tokenStream to an automaton.-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
-
-
-
Field Detail
-
SEP_LABEL
public static final int SEP_LABEL
Represents the default separator between tokens.- See Also:
- Constant Field Values
-
DEFAULT_MAX_GRAPH_EXPANSIONS
public static final int DEFAULT_MAX_GRAPH_EXPANSIONS
- See Also:
- Constant Field Values
-
DEFAULT_TOKEN_SEPARATOR
public static final Character DEFAULT_TOKEN_SEPARATOR
-
DEFAULT_PRESERVE_SEP
public static final boolean DEFAULT_PRESERVE_SEP
- See Also:
- Constant Field Values
-
DEFAULT_PRESERVE_POSITION_INCREMENTS
public static final boolean DEFAULT_PRESERVE_POSITION_INCREMENTS
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
ConcatenateGraphFilter
public ConcatenateGraphFilter(TokenStream inputTokenStream)
Creates a token stream to convertinput
to a token stream of accepted strings by its token stream graph.This constructor uses the default settings of the constants in this class.
-
ConcatenateGraphFilter
public ConcatenateGraphFilter(TokenStream inputTokenStream, Character tokenSeparator, boolean preservePositionIncrements, int maxGraphExpansions)
Creates a token stream to convertinput
to a token stream of accepted strings by its token stream graph.- Parameters:
inputTokenStream
- The input/incoming TokenStreamtokenSeparator
- Separator to use for concatenation. Can be null, in this case tokens will be concatenated without any separators.preservePositionIncrements
- Whether to add an empty token for missing positions. The effect is a consecutiveSEP_LABEL
. When false, it's as if there were no missing positions (we pretend the surrounding tokens were adjacent).maxGraphExpansions
- If the tokenStream graph has more than this many possible paths through, then we'll throwTooComplexToDeterminizeException
to preserve the stability and memory of the machine.- Throws:
TooComplexToDeterminizeException
- if the tokenStream graph has more thanmaxGraphExpansions
expansions
-
ConcatenateGraphFilter
public ConcatenateGraphFilter(TokenStream inputTokenStream, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions)
CallsConcatenateGraphFilter(org.apache.lucene.analysis.TokenStream, java.lang.Character, boolean, int)
- Parameters:
preserveSep
- WhetherSEP_LABEL
should separate the input tokens in the concatenated token
-
-
Method Detail
-
reset
public void reset() throws IOException
- Overrides:
reset
in classTokenStream
- Throws:
IOException
-
incrementToken
public boolean incrementToken() throws IOException
- Specified by:
incrementToken
in classTokenStream
- Throws:
IOException
-
end
public void end() throws IOException
- Overrides:
end
in classTokenStream
- Throws:
IOException
-
close
public void close() throws IOException
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classTokenStream
- Throws:
IOException
-
toAutomaton
public Automaton toAutomaton() throws IOException
Converts the tokenStream to an automaton, treating the transition labels as utf-8. Does *not* close it.- Throws:
IOException
-
toAutomaton
public Automaton toAutomaton(boolean unicodeAware) throws IOException
Converts the tokenStream to an automaton. Does *not* close it.- Throws:
IOException
-
-