org.apache.lucene.analysis.shingle
Class ShingleAnalyzerWrapper

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.AnalyzerWrapper
          extended by org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapper
All Implemented Interfaces:
Closeable

public final class ShingleAnalyzerWrapper
extends AnalyzerWrapper

A ShingleAnalyzerWrapper wraps a ShingleFilter around another Analyzer.

A shingle is another name for a token based n-gram.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
 
Constructor Summary
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)
           
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int maxShingleSize)
           
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize)
           
ShingleAnalyzerWrapper(Analyzer delegate, int minShingleSize, int maxShingleSize, String tokenSeparator, boolean outputUnigrams, boolean outputUnigramsIfNoShingles, String fillerToken)
          Creates a new ShingleAnalyzerWrapper
ShingleAnalyzerWrapper(Version matchVersion)
          Wraps StandardAnalyzer.
ShingleAnalyzerWrapper(Version matchVersion, int minShingleSize, int maxShingleSize)
          Wraps StandardAnalyzer.
 
Method Summary
 String getFillerToken()
           
 int getMaxShingleSize()
          The max shingle (token ngram) size
 int getMinShingleSize()
          The min shingle (token ngram) size
 String getTokenSeparator()
           
 Analyzer getWrappedAnalyzer(String fieldName)
           
 boolean isOutputUnigrams()
           
 boolean isOutputUnigramsIfNoShingles()
           
protected  Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)
           
 
Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper
createComponents, getOffsetGap, getPositionIncrementGap, initReader, wrapReader
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getReuseStrategy, tokenStream, tokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
                              int maxShingleSize)

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
                              int minShingleSize,
                              int maxShingleSize)

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Analyzer delegate,
                              int minShingleSize,
                              int maxShingleSize,
                              String tokenSeparator,
                              boolean outputUnigrams,
                              boolean outputUnigramsIfNoShingles,
                              String fillerToken)
Creates a new ShingleAnalyzerWrapper

Parameters:
delegate - Analyzer whose TokenStream is to be filtered
minShingleSize - Min shingle (token ngram) size
maxShingleSize - Max shingle size
tokenSeparator - Used to separate input stream tokens in output shingles
outputUnigrams - Whether or not the filter shall pass the original tokens to the output stream
outputUnigramsIfNoShingles - Overrides the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available.
fillerToken - filler token to use when positionIncrement is more than 1

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Version matchVersion)
Wraps StandardAnalyzer.


ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Version matchVersion,
                              int minShingleSize,
                              int maxShingleSize)
Wraps StandardAnalyzer.

Method Detail

getMaxShingleSize

public int getMaxShingleSize()
The max shingle (token ngram) size

Returns:
The max shingle (token ngram) size

getMinShingleSize

public int getMinShingleSize()
The min shingle (token ngram) size

Returns:
The min shingle (token ngram) size

getTokenSeparator

public String getTokenSeparator()

isOutputUnigrams

public boolean isOutputUnigrams()

isOutputUnigramsIfNoShingles

public boolean isOutputUnigramsIfNoShingles()

getFillerToken

public String getFillerToken()

getWrappedAnalyzer

public final Analyzer getWrappedAnalyzer(String fieldName)
Specified by:
getWrappedAnalyzer in class AnalyzerWrapper

wrapComponents

protected Analyzer.TokenStreamComponents wrapComponents(String fieldName,
                                                        Analyzer.TokenStreamComponents components)
Overrides:
wrapComponents in class AnalyzerWrapper


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.