org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer

All Implemented Interfaces:: Closeable, AutoCloseable

public final class QueryAutoStopWordAnalyzer extends AnalyzerWrapper

An Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.

For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

Since:: 3.1

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
Field Summary

Fields

Modifier and Type

Field

Description

static final float

defaultMaxDocFreqPercent

Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
Constructor Summary

Constructors

Constructor

Description

QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent

QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, float maxPercentDocs)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs

QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, int maxDocFreq)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq

QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs

QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq)

Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
Method Summary

Modifier and Type

Method

Description

Term[]

getStopWords()

Provides information on which stop words have been identified for all fields

String[]

getStopWords(String fieldName)

Provides information on which stop words have been identified for a field

protected Analyzer

getWrappedAnalyzer(String fieldName)

protected Analyzer.TokenStreamComponents

wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)

Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper
attributeFactory, createComponents, getOffsetGap, getPositionIncrementGap, initReader, initReaderForNormalization, normalize, wrapReader, wrapReaderForNormalization, wrapTokenStreamForNormalization

Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getReuseStrategy, normalize, tokenStream, tokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- defaultMaxDocFreqPercent
  
  public static final float defaultMaxDocFreqPercent
  See Also:
  
  Constant Field Values
Constructor Details
- QueryAutoStopWordAnalyzer
  
  public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader) throws IOException
  
  Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent
  
  Parameters:
  
  delegate - Analyzer whose TokenStream will be filtered
  
  indexReader - IndexReader to identify the stopwords from
  
  Throws:
  
  IOException - Can be thrown while reading from the IndexReader
- QueryAutoStopWordAnalyzer
  
  public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, int maxDocFreq) throws IOException
  
  Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq
  
  Parameters:
  
  delegate - Analyzer whose TokenStream will be filtered
  
  indexReader - IndexReader to identify the stopwords from
  
  maxDocFreq - Document frequency terms should be above in order to be stopwords
  
  Throws:
  
  IOException - Can be thrown while reading from the IndexReader
- QueryAutoStopWordAnalyzer
  
  public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, float maxPercentDocs) throws IOException
  
  Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs
  
  Parameters:
  
  delegate - Analyzer whose TokenStream will be filtered
  
  indexReader - IndexReader to identify the stopwords from
  
  maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
  
  Throws:
  
  IOException - Can be thrown while reading from the IndexReader
- QueryAutoStopWordAnalyzer
  
  public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs) throws IOException
  
  Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs
  
  Parameters:
  
  delegate - Analyzer whose TokenStream will be filtered
  
  indexReader - IndexReader to identify the stopwords from
  
  fields - Selection of fields to calculate stopwords for
  
  maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
  
  Throws:
  
  IOException - Can be thrown while reading from the IndexReader
- QueryAutoStopWordAnalyzer
  
  public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq) throws IOException
  
  Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
  
  Parameters:
  
  delegate - Analyzer whose TokenStream will be filtered
  
  indexReader - IndexReader to identify the stopwords from
  
  fields - Selection of fields to calculate stopwords for
  
  maxDocFreq - Document frequency terms should be above in order to be stopwords
  
  Throws:
  
  IOException - Can be thrown while reading from the IndexReader
Method Details
- getWrappedAnalyzer
  
  protected Analyzer getWrappedAnalyzer(String fieldName)
  
  Specified by:
  
  getWrappedAnalyzer in class AnalyzerWrapper
- wrapComponents
  
  protected Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)
  
  Overrides:
  
  wrapComponents in class AnalyzerWrapper
- getStopWords
  
  public String[] getStopWords(String fieldName)
  
  Provides information on which stop words have been identified for a field
  
  Parameters:
  
  fieldName - The field for which stop words identified in "addStopWords" method calls will be returned
  
  Returns:
  
  the stop words identified for a field
- getStopWords
  
  public Term[] getStopWords()
  
  Provides information on which stop words have been identified for all fields
  
  Returns:
  
  the stop words (as terms)

Class QueryAutoStopWordAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer

Field Summary

Fields inherited from class org.apache.lucene.analysis.Analyzer

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Field Details

defaultMaxDocFreqPercent

Constructor Details

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

Method Details

getWrappedAnalyzer

wrapComponents

getStopWords

getStopWords