Class DelimitedTermFrequencyTokenFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.miscellaneous.DelimitedTermFrequencyTokenFilter
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Unwrappable<TokenStream>
Characters before the delimiter are the "token", the textual integer after is the term frequency.
To use this
TokenFilter
the field must be indexed with IndexOptions.DOCS_AND_FREQS
but no positions or offsets.
For example, if the delimiter is '|', then for the string "foo|5", "foo" is the token and "5" is a term frequency. If there is no delimiter, the TokenFilter does not modify the term frequency.
Note make sure your Tokenizer doesn't split on the delimiter, or this won't work
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
Field Summary
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
Constructor Summary
-
Method Summary
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, reset, unwrap
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Field Details
-
DEFAULT_DELIMITER
public static final char DEFAULT_DELIMITER- See Also:
-
-
Constructor Details
-
DelimitedTermFrequencyTokenFilter
-
DelimitedTermFrequencyTokenFilter
-
-
Method Details
-
incrementToken
- Specified by:
incrementToken
in classTokenStream
- Throws:
IOException
-