|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.analysis.Analyzer org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapper
public final class ShingleAnalyzerWrapper
A ShingleAnalyzerWrapper wraps a ShingleFilter
around another Analyzer
.
A shingle is another name for a token based n-gram.
Constructor Summary | |
---|---|
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)
|
|
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
int maxShingleSize)
|
|
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
int minShingleSize,
int maxShingleSize)
|
|
ShingleAnalyzerWrapper(Version matchVersion)
Wraps StandardAnalyzer . |
|
ShingleAnalyzerWrapper(Version matchVersion,
int minShingleSize,
int maxShingleSize)
Wraps StandardAnalyzer . |
Method Summary | |
---|---|
int |
getMaxShingleSize()
The max shingle (token ngram) size |
int |
getMinShingleSize()
The min shingle (token ngram) size |
String |
getTokenSeparator()
|
boolean |
isOutputUnigrams()
|
boolean |
isOutputUnigramsIfNoShingles()
|
TokenStream |
reusableTokenStream(String fieldName,
Reader reader)
Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method. |
void |
setMaxShingleSize(int maxShingleSize)
Set the maximum size of output shingles (default: 2) |
void |
setMinShingleSize(int minShingleSize)
Set the min shingle size (default: 2). |
void |
setOutputUnigrams(boolean outputUnigrams)
Shall the filter pass the original tokens (the "unigrams") to the output stream? |
void |
setOutputUnigramsIfNoShingles(boolean outputUnigramsIfNoShingles)
Shall we override the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? (default: false.) |
void |
setTokenSeparator(String tokenSeparator)
Sets the string to use when joining adjacent tokens to form a shingle |
TokenStream |
tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader. |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
---|
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int maxShingleSize)
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize)
public ShingleAnalyzerWrapper(Version matchVersion)
StandardAnalyzer
.
public ShingleAnalyzerWrapper(Version matchVersion, int minShingleSize, int maxShingleSize)
StandardAnalyzer
.
Method Detail |
---|
public int getMaxShingleSize()
public void setMaxShingleSize(int maxShingleSize)
maxShingleSize
- max shingle sizepublic int getMinShingleSize()
public void setMinShingleSize(int minShingleSize)
Set the min shingle size (default: 2).
This method requires that the passed in minShingleSize is not greater than maxShingleSize, so make sure that maxShingleSize is set before calling this method.
minShingleSize
- min size of output shinglespublic String getTokenSeparator()
public void setTokenSeparator(String tokenSeparator)
tokenSeparator
- used to separate input stream tokens in output shinglespublic boolean isOutputUnigrams()
public void setOutputUnigrams(boolean outputUnigrams)
outputUnigrams
- Whether or not the filter shall pass the original
tokens to the output streampublic boolean isOutputUnigramsIfNoShingles()
public void setOutputUnigramsIfNoShingles(boolean outputUnigramsIfNoShingles)
Shall we override the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? (default: false.)
Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available.
outputUnigramsIfNoShingles
- Whether or not to output a single
unigram when no shingles are available.public TokenStream tokenStream(String fieldName, Reader reader)
Analyzer
tokenStream
in class Analyzer
public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
Analyzer
reusableTokenStream
in class Analyzer
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |