QueryAutoStopWordAnalyzer (Lucene 6.4.0 API)

java.lang.Object
- org.apache.lucene.analysis.Analyzer
- - org.apache.lucene.analysis.AnalyzerWrapper
  - - org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer

All Implemented Interfaces:

Closeable, AutoCloseable
```
public final class QueryAutoStopWordAnalyzer
extends AnalyzerWrapper
```
An Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.
For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
  Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents

Field Summary

Fields
Modifier and Type Field and Description

static float defaultMaxDocFreqPercent
- Fields inherited from class org.apache.lucene.analysis.Analyzer
  GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY

Fields
Modifier and Type	Field and Description
`static float`	`defaultMaxDocFreqPercent`

Constructor Summary

Constructors
Constructor and Description
`QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than `defaultMaxDocFreqPercent`
`QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs
`QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
`QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, float maxPercentDocs)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs
`QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, int maxDocFreq)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Term[]`	`getStopWords()` Provides information on which stop words have been identified for all fields
`String[]`	`getStopWords(String fieldName)` Provides information on which stop words have been identified for a field
`protected Analyzer`	`getWrappedAnalyzer(String fieldName)`
`protected Analyzer.TokenStreamComponents`	`wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)`

Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper
attributeFactory, createComponents, getOffsetGap, getPositionIncrementGap, initReader, initReaderForNormalization, normalize, wrapReader, wrapReaderForNormalization, wrapTokenStreamForNormalization

Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getReuseStrategy, getVersion, normalize, setVersion, tokenStream, tokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - defaultMaxDocFreqPercent
```
public static final float defaultMaxDocFreqPercent
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Analyzer delegate,
                                 IndexReader indexReader)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent
    
    Parameters:
    
    delegate - Analyzer whose TokenStream will be filtered
    
    indexReader - IndexReader to identify the stopwords from
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Analyzer delegate,
                                 IndexReader indexReader,
                                 int maxDocFreq)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq
    
    Parameters:
    
    delegate - Analyzer whose TokenStream will be filtered
    
    indexReader - IndexReader to identify the stopwords from
    
    maxDocFreq - Document frequency terms should be above in order to be stopwords
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Analyzer delegate,
                                 IndexReader indexReader,
                                 float maxPercentDocs)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs
    
    Parameters:
    
    delegate - Analyzer whose TokenStream will be filtered
    
    indexReader - IndexReader to identify the stopwords from
    
    maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Analyzer delegate,
                                 IndexReader indexReader,
                                 Collection<String> fields,
                                 float maxPercentDocs)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs
    
    Parameters:
    
    delegate - Analyzer whose TokenStream will be filtered
    
    indexReader - IndexReader to identify the stopwords from
    
    fields - Selection of fields to calculate stopwords for
    
    maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Analyzer delegate,
                                 IndexReader indexReader,
                                 Collection<String> fields,
                                 int maxDocFreq)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
    
    Parameters:
    
    delegate - Analyzer whose TokenStream will be filtered
    
    indexReader - IndexReader to identify the stopwords from
    
    fields - Selection of fields to calculate stopwords for
    
    maxDocFreq - Document frequency terms should be above in order to be stopwords
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
- Method Detail
  - getWrappedAnalyzer
```
protected Analyzer getWrappedAnalyzer(String fieldName)
```
    Specified by:
    
    getWrappedAnalyzer in class AnalyzerWrapper
  - wrapComponents
```
protected Analyzer.TokenStreamComponents wrapComponents(String fieldName,
                                                        Analyzer.TokenStreamComponents components)
```
    Overrides:
    
    wrapComponents in class AnalyzerWrapper
  - getStopWords
```
public String[] getStopWords(String fieldName)
```
    Provides information on which stop words have been identified for a field
    
    Parameters:
    
    fieldName - The field for which stop words identified in "addStopWords" method calls will be returned
    
    Returns:
    
    the stop words identified for a field
  - getStopWords
```
public Term[] getStopWords()
```
    Provides information on which stop words have been identified for all fields
    
    Returns:
    
    the stop words (as terms)

Class QueryAutoStopWordAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer

Field Summary

Fields inherited from class org.apache.lucene.analysis.Analyzer

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Field Detail

defaultMaxDocFreqPercent

Constructor Detail

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

Method Detail

getWrappedAnalyzer

wrapComponents

getStopWords

getStopWords