Class FingerprintFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.miscellaneous.FingerprintFilter
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Unwrappable<TokenStream>
Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of
input tokens. This can be useful for clustering/linking use cases.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
static final char
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
Constructor Summary
ConstructorDescriptionFingerprintFilter
(TokenStream input) Create a new FingerprintFilter with default settingsFingerprintFilter
(TokenStream input, int maxOutputTokenSize, char separator) Create a new FingerprintFilter with control over all settings -
Method Summary
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, unwrap
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Field Details
-
DEFAULT_MAX_OUTPUT_TOKEN_SIZE
public static final int DEFAULT_MAX_OUTPUT_TOKEN_SIZE- See Also:
-
DEFAULT_SEPARATOR
public static final char DEFAULT_SEPARATOR- See Also:
-
-
Constructor Details
-
FingerprintFilter
Create a new FingerprintFilter with default settings -
FingerprintFilter
Create a new FingerprintFilter with control over all settings- Parameters:
input
- the source of tokens to be summarized into a single tokenmaxOutputTokenSize
- the maximum length of the summarized output token. If exceeded, no output token is emittedseparator
- the character used to separate tokens combined into the single output token
-
-
Method Details
-
incrementToken
- Specified by:
incrementToken
in classTokenStream
- Throws:
IOException
-
end
- Overrides:
end
in classTokenFilter
- Throws:
IOException
-
reset
- Overrides:
reset
in classTokenFilter
- Throws:
IOException
-