public final class DelimitedTermFrequencyTokenFilter extends TokenFilter
TokenFilter
the field must be indexed with
IndexOptions.DOCS_AND_FREQS
but no positions or offsets.
For example, if the delimiter is '|', then for the string "foo|5", "foo" is the token and "5" is a term frequency. If there is no delimiter, the TokenFilter does not modify the term frequency.
Note make sure your Tokenizer doesn't split on the delimiter, or this won't work
AttributeSource.State
Modifier and Type | Field and Description |
---|---|
static char |
DEFAULT_DELIMITER |
input
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
DelimitedTermFrequencyTokenFilter(TokenStream input) |
DelimitedTermFrequencyTokenFilter(TokenStream input,
char delimiter) |
Modifier and Type | Method and Description |
---|---|
boolean |
incrementToken() |
close, end, reset
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public static final char DEFAULT_DELIMITER
public DelimitedTermFrequencyTokenFilter(TokenStream input)
public DelimitedTermFrequencyTokenFilter(TokenStream input, char delimiter)
public boolean incrementToken() throws IOException
incrementToken
in class TokenStream
IOException
Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.