org.apache.lucene.analysis.shingle
Class ShingleAnalyzerWrapper

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapper
All Implemented Interfaces:
Closeable

public final class ShingleAnalyzerWrapper
extends org.apache.lucene.analysis.Analyzer

A ShingleAnalyzerWrapper wraps a ShingleFilter around another Analyzer.

A shingle is another name for a token based n-gram.


Constructor Summary
ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer)
           
ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer, int maxShingleSize)
           
ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize)
           
ShingleAnalyzerWrapper(org.apache.lucene.util.Version matchVersion)
          Wraps StandardAnalyzer.
ShingleAnalyzerWrapper(org.apache.lucene.util.Version matchVersion, int minShingleSize, int maxShingleSize)
          Wraps StandardAnalyzer.
 
Method Summary
 int getMaxShingleSize()
          The max shingle (token ngram) size
 int getMinShingleSize()
          The min shingle (token ngram) size
 String getTokenSeparator()
           
 boolean isOutputUnigrams()
           
 boolean isOutputUnigramsIfNoShingles()
           
 org.apache.lucene.analysis.TokenStream reusableTokenStream(String fieldName, Reader reader)
           
 void setMaxShingleSize(int maxShingleSize)
          Set the maximum size of output shingles (default: 2)
 void setMinShingleSize(int minShingleSize)
          Set the min shingle size (default: 2).
 void setOutputUnigrams(boolean outputUnigrams)
          Shall the filter pass the original tokens (the "unigrams") to the output stream?
 void setOutputUnigramsIfNoShingles(boolean outputUnigramsIfNoShingles)
          Shall we override the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? (default: false.)
 void setTokenSeparator(String tokenSeparator)
          Sets the string to use when joining adjacent tokens to form a shingle
 org.apache.lucene.analysis.TokenStream tokenStream(String fieldName, Reader reader)
           
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer)

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer,
                              int maxShingleSize)

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(org.apache.lucene.analysis.Analyzer defaultAnalyzer,
                              int minShingleSize,
                              int maxShingleSize)

ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(org.apache.lucene.util.Version matchVersion)
Wraps StandardAnalyzer.


ShingleAnalyzerWrapper

public ShingleAnalyzerWrapper(org.apache.lucene.util.Version matchVersion,
                              int minShingleSize,
                              int maxShingleSize)
Wraps StandardAnalyzer.

Method Detail

getMaxShingleSize

public int getMaxShingleSize()
The max shingle (token ngram) size

Returns:
The max shingle (token ngram) size

setMaxShingleSize

public void setMaxShingleSize(int maxShingleSize)
Set the maximum size of output shingles (default: 2)

Parameters:
maxShingleSize - max shingle size

getMinShingleSize

public int getMinShingleSize()
The min shingle (token ngram) size

Returns:
The min shingle (token ngram) size

setMinShingleSize

public void setMinShingleSize(int minShingleSize)

Set the min shingle size (default: 2).

This method requires that the passed in minShingleSize is not greater than maxShingleSize, so make sure that maxShingleSize is set before calling this method.

Parameters:
minShingleSize - min size of output shingles

getTokenSeparator

public String getTokenSeparator()

setTokenSeparator

public void setTokenSeparator(String tokenSeparator)
Sets the string to use when joining adjacent tokens to form a shingle

Parameters:
tokenSeparator - used to separate input stream tokens in output shingles

isOutputUnigrams

public boolean isOutputUnigrams()

setOutputUnigrams

public void setOutputUnigrams(boolean outputUnigrams)
Shall the filter pass the original tokens (the "unigrams") to the output stream?

Parameters:
outputUnigrams - Whether or not the filter shall pass the original tokens to the output stream

isOutputUnigramsIfNoShingles

public boolean isOutputUnigramsIfNoShingles()

setOutputUnigramsIfNoShingles

public void setOutputUnigramsIfNoShingles(boolean outputUnigramsIfNoShingles)

Shall we override the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? (default: false.)

Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available.

Parameters:
outputUnigramsIfNoShingles - Whether or not to output a single unigram when no shingles are available.

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(String fieldName,
                                                          Reader reader)
Specified by:
tokenStream in class org.apache.lucene.analysis.Analyzer

reusableTokenStream

public org.apache.lucene.analysis.TokenStream reusableTokenStream(String fieldName,
                                                                  Reader reader)
                                                           throws IOException
Overrides:
reusableTokenStream in class org.apache.lucene.analysis.Analyzer
Throws:
IOException


Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.