Class ShingleAnalyzerWrapper

All Implemented Interfaces:
Closeable, AutoCloseable

public final class ShingleAnalyzerWrapper extends AnalyzerWrapper
A ShingleAnalyzerWrapper wraps a ShingleFilter around another Analyzer.

A shingle is another name for a token based n-gram.

Since:
3.1
  • Constructor Details

    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)
    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int maxShingleSize)
    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize)
    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper(Analyzer delegate, int minShingleSize, int maxShingleSize, String tokenSeparator, boolean outputUnigrams, boolean outputUnigramsIfNoShingles, String fillerToken)
      Creates a new ShingleAnalyzerWrapper
      Parameters:
      delegate - Analyzer whose TokenStream is to be filtered
      minShingleSize - Min shingle (token ngram) size
      maxShingleSize - Max shingle size
      tokenSeparator - Used to separate input stream tokens in output shingles
      outputUnigrams - Whether or not the filter shall pass the original tokens to the output stream
      outputUnigramsIfNoShingles - Overrides the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available.
      fillerToken - filler token to use when positionIncrement is more than 1
    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper()
    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper(int minShingleSize, int maxShingleSize)
  • Method Details

    • getMaxShingleSize

      public int getMaxShingleSize()
      The max shingle (token ngram) size
      Returns:
      The max shingle (token ngram) size
    • getMinShingleSize

      public int getMinShingleSize()
      The min shingle (token ngram) size
      Returns:
      The min shingle (token ngram) size
    • getTokenSeparator

      public String getTokenSeparator()
    • isOutputUnigrams

      public boolean isOutputUnigrams()
    • isOutputUnigramsIfNoShingles

      public boolean isOutputUnigramsIfNoShingles()
    • getFillerToken

      public String getFillerToken()
    • getWrappedAnalyzer

      public final Analyzer getWrappedAnalyzer(String fieldName)
      Specified by:
      getWrappedAnalyzer in class AnalyzerWrapper
    • wrapComponents

      protected Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)
      Overrides:
      wrapComponents in class AnalyzerWrapper