public final class NGramTokenFilter extends TokenFilter
You can make this filter use the old behavior by using
Lucene43NGramTokenFilter
but this is not recommended as
it will lead to broken TokenStream
s that will cause highlighting
bugs.
If you were using this TokenFilter
to perform partial highlighting,
this won't work anymore since this filter doesn't update offsets. You should
modify your analysis chain to use NGramTokenizer
, and potentially
override NGramTokenizer.isTokenChar(int)
to perform pre-tokenization.
AttributeSource.State
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MAX_NGRAM_SIZE |
static int |
DEFAULT_MIN_NGRAM_SIZE |
input
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
NGramTokenFilter(TokenStream input)
Creates NGramTokenFilter with default min and max n-grams.
|
NGramTokenFilter(TokenStream input,
int minGram,
int maxGram)
Creates NGramTokenFilter with given min and max n-grams.
|
Modifier and Type | Method and Description |
---|---|
boolean |
incrementToken()
Returns the next token in the stream, or null at EOS.
|
void |
reset() |
close, end
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public static final int DEFAULT_MIN_NGRAM_SIZE
public static final int DEFAULT_MAX_NGRAM_SIZE
public NGramTokenFilter(TokenStream input, int minGram, int maxGram)
input
- TokenStream
holding the input to be tokenizedminGram
- the smallest n-gram to generatemaxGram
- the largest n-gram to generatepublic NGramTokenFilter(TokenStream input)
input
- TokenStream
holding the input to be tokenizedpublic final boolean incrementToken() throws IOException
incrementToken
in class TokenStream
IOException
public void reset() throws IOException
reset
in class TokenFilter
IOException
Copyright © 2000-2015 Apache Software Foundation. All Rights Reserved.