Class FingerprintFilter
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.TokenFilter
-
- org.apache.lucene.analysis.miscellaneous.FingerprintFilter
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public class FingerprintFilter extends TokenFilter
Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens. This can be useful for clustering/linking use cases.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_MAX_OUTPUT_TOKEN_SIZE
static char
DEFAULT_SEPARATOR
-
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
-
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
-
Constructor Summary
Constructors Constructor Description FingerprintFilter(TokenStream input)
Create a new FingerprintFilter with default settingsFingerprintFilter(TokenStream input, int maxOutputTokenSize, char separator)
Create a new FingerprintFilter with control over all settings
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
end()
boolean
incrementToken()
void
reset()
-
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
-
-
-
Field Detail
-
DEFAULT_MAX_OUTPUT_TOKEN_SIZE
public static final int DEFAULT_MAX_OUTPUT_TOKEN_SIZE
- See Also:
- Constant Field Values
-
DEFAULT_SEPARATOR
public static final char DEFAULT_SEPARATOR
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
FingerprintFilter
public FingerprintFilter(TokenStream input)
Create a new FingerprintFilter with default settings
-
FingerprintFilter
public FingerprintFilter(TokenStream input, int maxOutputTokenSize, char separator)
Create a new FingerprintFilter with control over all settings- Parameters:
input
- the source of tokens to be summarized into a single tokenmaxOutputTokenSize
- the maximum length of the summarized output token. If exceeded, no output token is emittedseparator
- the character used to separate tokens combined into the single output token
-
-
Method Detail
-
incrementToken
public final boolean incrementToken() throws IOException
- Specified by:
incrementToken
in classTokenStream
- Throws:
IOException
-
end
public final void end() throws IOException
- Overrides:
end
in classTokenFilter
- Throws:
IOException
-
reset
public void reset() throws IOException
- Overrides:
reset
in classTokenFilter
- Throws:
IOException
-
-