|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.util.AttributeSource org.apache.lucene.analysis.TokenStream org.apache.lucene.analysis.Tokenizer org.apache.lucene.analysis.ngram.EdgeNGramTokenizer
public final class EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).
This Tokenizer
create n-grams from the beginning edge or ending edge of a input token.
MaxGram can't be larger than 1024 because of limitation.
Nested Class Summary | |
---|---|
static class |
EdgeNGramTokenizer.Side
Specifies which side of the input the n-gram should be generated from |
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
---|
AttributeSource.AttributeFactory, AttributeSource.State |
Field Summary | |
---|---|
static int |
DEFAULT_MAX_GRAM_SIZE
|
static int |
DEFAULT_MIN_GRAM_SIZE
|
static EdgeNGramTokenizer.Side |
DEFAULT_SIDE
|
Fields inherited from class org.apache.lucene.analysis.Tokenizer |
---|
input |
Constructor Summary | |
---|---|
EdgeNGramTokenizer(AttributeSource.AttributeFactory factory,
Reader input,
EdgeNGramTokenizer.Side side,
int minGram,
int maxGram)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range |
|
EdgeNGramTokenizer(AttributeSource.AttributeFactory factory,
Reader input,
String sideLabel,
int minGram,
int maxGram)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range |
|
EdgeNGramTokenizer(AttributeSource source,
Reader input,
EdgeNGramTokenizer.Side side,
int minGram,
int maxGram)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range |
|
EdgeNGramTokenizer(AttributeSource source,
Reader input,
String sideLabel,
int minGram,
int maxGram)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range |
|
EdgeNGramTokenizer(Reader input,
EdgeNGramTokenizer.Side side,
int minGram,
int maxGram)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range |
|
EdgeNGramTokenizer(Reader input,
String sideLabel,
int minGram,
int maxGram)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range |
Method Summary | |
---|---|
void |
end()
|
boolean |
incrementToken()
Returns the next token in the stream, or null at EOS. |
void |
reset()
|
Methods inherited from class org.apache.lucene.analysis.Tokenizer |
---|
close, correctOffset, setReader |
Methods inherited from class org.apache.lucene.util.AttributeSource |
---|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final EdgeNGramTokenizer.Side DEFAULT_SIDE
public static final int DEFAULT_MAX_GRAM_SIZE
public static final int DEFAULT_MIN_GRAM_SIZE
Constructor Detail |
---|
public EdgeNGramTokenizer(Reader input, EdgeNGramTokenizer.Side side, int minGram, int maxGram)
input
- Reader
holding the input to be tokenizedside
- the EdgeNGramTokenizer.Side
from which to chop off an n-gramminGram
- the smallest n-gram to generatemaxGram
- the largest n-gram to generatepublic EdgeNGramTokenizer(AttributeSource source, Reader input, EdgeNGramTokenizer.Side side, int minGram, int maxGram)
source
- AttributeSource
to useinput
- Reader
holding the input to be tokenizedside
- the EdgeNGramTokenizer.Side
from which to chop off an n-gramminGram
- the smallest n-gram to generatemaxGram
- the largest n-gram to generatepublic EdgeNGramTokenizer(AttributeSource.AttributeFactory factory, Reader input, EdgeNGramTokenizer.Side side, int minGram, int maxGram)
factory
- AttributeSource.AttributeFactory
to useinput
- Reader
holding the input to be tokenizedside
- the EdgeNGramTokenizer.Side
from which to chop off an n-gramminGram
- the smallest n-gram to generatemaxGram
- the largest n-gram to generatepublic EdgeNGramTokenizer(Reader input, String sideLabel, int minGram, int maxGram)
input
- Reader
holding the input to be tokenizedsideLabel
- the name of the EdgeNGramTokenizer.Side
from which to chop off an n-gramminGram
- the smallest n-gram to generatemaxGram
- the largest n-gram to generatepublic EdgeNGramTokenizer(AttributeSource source, Reader input, String sideLabel, int minGram, int maxGram)
source
- AttributeSource
to useinput
- Reader
holding the input to be tokenizedsideLabel
- the name of the EdgeNGramTokenizer.Side
from which to chop off an n-gramminGram
- the smallest n-gram to generatemaxGram
- the largest n-gram to generatepublic EdgeNGramTokenizer(AttributeSource.AttributeFactory factory, Reader input, String sideLabel, int minGram, int maxGram)
factory
- AttributeSource.AttributeFactory
to useinput
- Reader
holding the input to be tokenizedsideLabel
- the name of the EdgeNGramTokenizer.Side
from which to chop off an n-gramminGram
- the smallest n-gram to generatemaxGram
- the largest n-gram to generateMethod Detail |
---|
public boolean incrementToken() throws IOException
incrementToken
in class TokenStream
IOException
public void end()
end
in class TokenStream
public void reset() throws IOException
reset
in class TokenStream
IOException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |