QueryAutoStopWordAnalyzer (Lucene 4.0.0 API)

java.lang.Object
- org.apache.lucene.analysis.Analyzer
- - org.apache.lucene.analysis.AnalyzerWrapper
  - - org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer

All Implemented Interfaces:

Closeable
```
public final class QueryAutoStopWordAnalyzer
extends AnalyzerWrapper
```
An Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.
For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
  Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents

Field Summary

Fields
Modifier and Type Field and Description

static float defaultMaxDocFreqPercent

Fields
Modifier and Type	Field and Description
`static float`	`defaultMaxDocFreqPercent`

Constructor Summary

Constructors
Constructor and Description
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than `defaultMaxDocFreqPercent`
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, float maxPercentDocs)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, int maxDocFreq)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq

Method Summary

Methods
Modifier and Type	Method and Description
`Term[]`	`getStopWords()` Provides information on which stop words have been identified for all fields
`String[]`	`getStopWords(String fieldName)` Provides information on which stop words have been identified for a field
`protected Analyzer`	`getWrappedAnalyzer(String fieldName)`
`protected Analyzer.TokenStreamComponents`	`wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)`

Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper
createComponents, getOffsetGap, getPositionIncrementGap, initReader

Methods inherited from class org.apache.lucene.analysis.Analyzer
close, tokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - defaultMaxDocFreqPercent
```
public static final float defaultMaxDocFreqPercent
```
    See Also:
    Constant Field Values
- Constructor Detail
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Version matchVersion,
                         Analyzer delegate,
                         IndexReader indexReader)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent
    
    Parameters:
    matchVersion - Version to be used in StopFilter
    delegate - Analyzer whose TokenStream will be filtered
    indexReader - IndexReader to identify the stopwords from
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Version matchVersion,
                         Analyzer delegate,
                         IndexReader indexReader,
                         int maxDocFreq)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq
    
    Parameters:
    matchVersion - Version to be used in StopFilter
    delegate - Analyzer whose TokenStream will be filtered
    indexReader - IndexReader to identify the stopwords from
    maxDocFreq - Document frequency terms should be above in order to be stopwords
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Version matchVersion,
                         Analyzer delegate,
                         IndexReader indexReader,
                         float maxPercentDocs)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs
    
    Parameters:
    matchVersion - Version to be used in StopFilter
    delegate - Analyzer whose TokenStream will be filtered
    indexReader - IndexReader to identify the stopwords from
    maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Version matchVersion,
                         Analyzer delegate,
                         IndexReader indexReader,
                         Collection<String> fields,
                         float maxPercentDocs)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs
    
    Parameters:
    matchVersion - Version to be used in StopFilter
    delegate - Analyzer whose TokenStream will be filtered
    indexReader - IndexReader to identify the stopwords from
    fields - Selection of fields to calculate stopwords for
    maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Version matchVersion,
                         Analyzer delegate,
                         IndexReader indexReader,
                         Collection<String> fields,
                         int maxDocFreq)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
    
    Parameters:
    matchVersion - Version to be used in StopFilter
    delegate - Analyzer whose TokenStream will be filtered
    indexReader - IndexReader to identify the stopwords from
    fields - Selection of fields to calculate stopwords for
    maxDocFreq - Document frequency terms should be above in order to be stopwords
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
- Method Detail
  - getWrappedAnalyzer
```
protected Analyzer getWrappedAnalyzer(String fieldName)
```
    Specified by:
    
    getWrappedAnalyzer in class AnalyzerWrapper
  - wrapComponents
```
protected Analyzer.TokenStreamComponents wrapComponents(String fieldName,
                                            Analyzer.TokenStreamComponents components)
```
    Specified by:
    
    wrapComponents in class AnalyzerWrapper
  - getStopWords
```
public String[] getStopWords(String fieldName)
```
    Provides information on which stop words have been identified for a field
    
    Parameters:
    fieldName - The field for which stop words identified in "addStopWords" method calls will be returned
    
    Returns:
    the stop words identified for a field
  - getStopWords
```
public Term[] getStopWords()
```
    Provides information on which stop words have been identified for all fields
    
    Returns:
    the stop words (as terms)

Class QueryAutoStopWordAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Field Detail

defaultMaxDocFreqPercent

Constructor Detail

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

Method Detail

getWrappedAnalyzer

wrapComponents

getStopWords

getStopWords