public final class ShingleAnalyzerWrapper extends AnalyzerWrapper
ShingleFilter
around another Analyzer
.
A shingle is another name for a token based n-gram.
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
Constructor and Description |
---|
ShingleAnalyzerWrapper()
Wraps
StandardAnalyzer . |
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer) |
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
int maxShingleSize) |
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
int minShingleSize,
int maxShingleSize) |
ShingleAnalyzerWrapper(Analyzer delegate,
int minShingleSize,
int maxShingleSize,
String tokenSeparator,
boolean outputUnigrams,
boolean outputUnigramsIfNoShingles,
String fillerToken)
Creates a new ShingleAnalyzerWrapper
|
ShingleAnalyzerWrapper(int minShingleSize,
int maxShingleSize)
Wraps
StandardAnalyzer . |
Modifier and Type | Method and Description |
---|---|
String |
getFillerToken() |
int |
getMaxShingleSize()
The max shingle (token ngram) size
|
int |
getMinShingleSize()
The min shingle (token ngram) size
|
String |
getTokenSeparator() |
Analyzer |
getWrappedAnalyzer(String fieldName) |
boolean |
isOutputUnigrams() |
boolean |
isOutputUnigramsIfNoShingles() |
protected Analyzer.TokenStreamComponents |
wrapComponents(String fieldName,
Analyzer.TokenStreamComponents components) |
createComponents, getOffsetGap, getPositionIncrementGap, initReader, wrapReader
attributeFactory, close, getReuseStrategy, getVersion, initReaderForNormalization, normalize, normalize, setVersion, tokenStream, tokenStream
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int maxShingleSize)
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize)
public ShingleAnalyzerWrapper(Analyzer delegate, int minShingleSize, int maxShingleSize, String tokenSeparator, boolean outputUnigrams, boolean outputUnigramsIfNoShingles, String fillerToken)
delegate
- Analyzer whose TokenStream is to be filteredminShingleSize
- Min shingle (token ngram) sizemaxShingleSize
- Max shingle sizetokenSeparator
- Used to separate input stream tokens in output shinglesoutputUnigrams
- Whether or not the filter shall pass the original
tokens to the output streamoutputUnigramsIfNoShingles
- Overrides the behavior of outputUnigrams==false for those
times when no shingles are available (because there are fewer than
minShingleSize tokens in the input stream)?
Note that if outputUnigrams==true, then unigrams are always output,
regardless of whether any shingles are available.fillerToken
- filler token to use when positionIncrement is more than 1public ShingleAnalyzerWrapper()
StandardAnalyzer
.public ShingleAnalyzerWrapper(int minShingleSize, int maxShingleSize)
StandardAnalyzer
.public int getMaxShingleSize()
public int getMinShingleSize()
public String getTokenSeparator()
public boolean isOutputUnigrams()
public boolean isOutputUnigramsIfNoShingles()
public String getFillerToken()
public final Analyzer getWrappedAnalyzer(String fieldName)
getWrappedAnalyzer
in class AnalyzerWrapper
protected Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)
wrapComponents
in class AnalyzerWrapper
Copyright © 2000-2016 Apache Software Foundation. All Rights Reserved.