Class QueryAutoStopWordAnalyzer

java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.AnalyzerWrapper
org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer
All Implemented Interfaces:
Closeable, AutoCloseable

public final class QueryAutoStopWordAnalyzer extends AnalyzerWrapper
An Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.

For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

Since:
3.1
  • Field Details

    • defaultMaxDocFreqPercent

      public static final float defaultMaxDocFreqPercent
      See Also:
  • Constructor Details

    • QueryAutoStopWordAnalyzer

      public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader) throws IOException
      Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent
      Parameters:
      delegate - Analyzer whose TokenStream will be filtered
      indexReader - IndexReader to identify the stopwords from
      Throws:
      IOException - Can be thrown while reading from the IndexReader
    • QueryAutoStopWordAnalyzer

      public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, int maxDocFreq) throws IOException
      Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq
      Parameters:
      delegate - Analyzer whose TokenStream will be filtered
      indexReader - IndexReader to identify the stopwords from
      maxDocFreq - Document frequency terms should be above in order to be stopwords
      Throws:
      IOException - Can be thrown while reading from the IndexReader
    • QueryAutoStopWordAnalyzer

      public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, float maxPercentDocs) throws IOException
      Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs
      Parameters:
      delegate - Analyzer whose TokenStream will be filtered
      indexReader - IndexReader to identify the stopwords from
      maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
      Throws:
      IOException - Can be thrown while reading from the IndexReader
    • QueryAutoStopWordAnalyzer

      public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs) throws IOException
      Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs
      Parameters:
      delegate - Analyzer whose TokenStream will be filtered
      indexReader - IndexReader to identify the stopwords from
      fields - Selection of fields to calculate stopwords for
      maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
      Throws:
      IOException - Can be thrown while reading from the IndexReader
    • QueryAutoStopWordAnalyzer

      public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq) throws IOException
      Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
      Parameters:
      delegate - Analyzer whose TokenStream will be filtered
      indexReader - IndexReader to identify the stopwords from
      fields - Selection of fields to calculate stopwords for
      maxDocFreq - Document frequency terms should be above in order to be stopwords
      Throws:
      IOException - Can be thrown while reading from the IndexReader
  • Method Details