Package org.apache.lucene.analysis.ngram
Class EdgeNGramTokenFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.ngram.EdgeNGramTokenFilter
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Unwrappable<TokenStream>
Tokenizes the given token into n-grams of given size(s).
This TokenFilter
create n-grams from the beginning edge of a input token.
As of Lucene 4.4, this filter handles correctly supplementary characters.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
Field Summary
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
Constructor Summary
ConstructorDescriptionEdgeNGramTokenFilter
(TokenStream input, int gramSize) Creates an EdgeNGramTokenFilter that produces edge n-grams of the given size.EdgeNGramTokenFilter
(TokenStream input, int minGram, int maxGram, boolean preserveOriginal) Creates an EdgeNGramTokenFilter that, for a given input term, produces all edge n-grams with lengths >= minGram and <= maxGram. -
Method Summary
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, unwrap
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Field Details
-
DEFAULT_PRESERVE_ORIGINAL
public static final boolean DEFAULT_PRESERVE_ORIGINAL- See Also:
-
-
Constructor Details
-
EdgeNGramTokenFilter
Creates an EdgeNGramTokenFilter that, for a given input term, produces all edge n-grams with lengths >= minGram and <= maxGram. Will optionally preserve the original term when its length is outside of the defined range.- Parameters:
input
-TokenStream
holding the input to be tokenizedminGram
- the minimum length of the generated n-gramsmaxGram
- the maximum length of the generated n-gramspreserveOriginal
- Whether or not to keep the original term when it is outside the min/max size range.
-
EdgeNGramTokenFilter
Creates an EdgeNGramTokenFilter that produces edge n-grams of the given size.- Parameters:
input
-TokenStream
holding the input to be tokenizedgramSize
- the n-gram size to generate.
-
-
Method Details
-
incrementToken
- Specified by:
incrementToken
in classTokenStream
- Throws:
IOException
-
reset
- Overrides:
reset
in classTokenFilter
- Throws:
IOException
-
end
- Overrides:
end
in classTokenFilter
- Throws:
IOException
-