org.apache.lucene.analysis.shingle
Class ShingleAnalyzerWrapper

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.AnalyzerWrapper
          extended by org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapper
All Implemented Interfaces:
Closeable

public final class ShingleAnalyzerWrapper
extends AnalyzerWrapper

A ShingleAnalyzerWrapper wraps a ShingleFilter around another Analyzer.

A shingle is another name for a token based n-gram.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
 
Constructor Summary
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)
           
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int maxShingleSize)
           
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize)
           
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize, String tokenSeparator, boolean outputUnigrams, boolean outputUnigramsIfNoShingles)
          Creates a new ShingleAnalyzerWrapper
ShingleAnalyzerWrapper(Version matchVersion)
          Wraps StandardAnalyzer.
ShingleAnalyzerWrapper(Version matchVersion, int minShingleSize, int maxShingleSize)
          Wraps StandardAnalyzer.
 
Method Summary
 int getMaxShingleSize()
          The max shingle (token ngram) size
 int getMinShingleSize()
          The min shingle (token ngram) size
 String getTokenSeparator()
           
protected  Analyzer getWrappedAnalyzer(String fieldName)
           
 boolean isOutputUnigrams()
           
 boolean isOutputUnigramsIfNoShingles()
           
protected  Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)
           
 
Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper
createComponents, getOffsetGap, getPositionIncrementGap, initReader
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, tokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
                              int maxShingleSize)

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
                              int minShingleSize,
                              int maxShingleSize)

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
                              int minShingleSize,
                              int maxShingleSize,
                              String tokenSeparator,
                              boolean outputUnigrams,
                              boolean outputUnigramsIfNoShingles)
Creates a new ShingleAnalyzerWrapper

Parameters:
defaultAnalyzer - Analyzer whose TokenStream is to be filtered
minShingleSize - Min shingle (token ngram) size
maxShingleSize - Max shingle size
tokenSeparator - Used to separate input stream tokens in output shingles
outputUnigrams - Whether or not the filter shall pass the original tokens to the output stream
outputUnigramsIfNoShingles - Overrides the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available.

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Version matchVersion)
Wraps StandardAnalyzer.


ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(Version matchVersion,
                              int minShingleSize,
                              int maxShingleSize)
Wraps StandardAnalyzer.

Method Detail

getMaxShingleSize

public int getMaxShingleSize()
The max shingle (token ngram) size

Returns:
The max shingle (token ngram) size

getMinShingleSize

public int getMinShingleSize()
The min shingle (token ngram) size

Returns:
The min shingle (token ngram) size

getTokenSeparator

public String getTokenSeparator()

isOutputUnigrams

public boolean isOutputUnigrams()

isOutputUnigramsIfNoShingles

public boolean isOutputUnigramsIfNoShingles()

getWrappedAnalyzer

protected Analyzer getWrappedAnalyzer(String fieldName)
Specified by:
getWrappedAnalyzer in class AnalyzerWrapper

wrapComponents

protected Analyzer.TokenStreamComponents wrapComponents(String fieldName,
                                                        Analyzer.TokenStreamComponents components)
Specified by:
wrapComponents in class AnalyzerWrapper


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.