Package org.apache.lucene.analysis.ngram
Class EdgeNGramTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.ngram.NGramTokenizer
org.apache.lucene.analysis.ngram.EdgeNGramTokenizer
- All Implemented Interfaces:
Closeable
,AutoCloseable
Tokenizes the input from an edge into n-grams of given size(s).
This Tokenizer
create n-grams from the beginning edge of a input token.
As of Lucene 4.4, this class supports pre-tokenization
and correctly handles supplementary characters.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
static final int
Fields inherited from class org.apache.lucene.analysis.ngram.NGramTokenizer
DEFAULT_MAX_NGRAM_SIZE, DEFAULT_MIN_NGRAM_SIZE
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
Constructor Summary
ConstructorDescriptionEdgeNGramTokenizer
(int minGram, int maxGram) Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given rangeEdgeNGramTokenizer
(AttributeFactory factory, int minGram, int maxGram) Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range -
Method Summary
Methods inherited from class org.apache.lucene.analysis.ngram.NGramTokenizer
end, incrementToken, isTokenChar, reset
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset, setReader, setReaderTestPoint
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Field Details
-
DEFAULT_MAX_GRAM_SIZE
public static final int DEFAULT_MAX_GRAM_SIZE- See Also:
-
DEFAULT_MIN_GRAM_SIZE
public static final int DEFAULT_MIN_GRAM_SIZE- See Also:
-
-
Constructor Details
-
EdgeNGramTokenizer
public EdgeNGramTokenizer(int minGram, int maxGram) Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range- Parameters:
minGram
- the smallest n-gram to generatemaxGram
- the largest n-gram to generate
-
EdgeNGramTokenizer
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range- Parameters:
factory
-AttributeFactory
to useminGram
- the smallest n-gram to generatemaxGram
- the largest n-gram to generate
-