QueryAutoStopWordAnalyzer (Lucene 3.6.0 API)

java.lang.Object
- org.apache.lucene.analysis.Analyzer
- - org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer

All Implemented Interfaces:

Closeable
```
public final class QueryAutoStopWordAnalyzer
extends Analyzer
```
An Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.
For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

Use the various "addStopWords" methods in this class to automate the identification and addition of stop words found in an already existing index.

Field Summary

Fields
Modifier and Type Field and Description

static float defaultMaxDocFreqPercent

Fields
Modifier and Type	Field and Description
`static float`	`defaultMaxDocFreqPercent`

Constructor Summary

Constructors
Constructor and Description
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate)` Deprecated. Stopwords should be calculated at instantiation using one of the other constructors
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than `defaultMaxDocFreqPercent`
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, float maxPercentDocs)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs
`QueryAutoStopWordAnalyzer(Version matchVersion, Analyzer delegate, IndexReader indexReader, int maxDocFreq)` Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq

Method Summary

Methods
Modifier and Type	Method and Description
`int`	`addStopWords(IndexReader reader)` Deprecated. Stopwords should be calculated at instantiation using `QueryAutoStopWordAnalyzer(Version, Analyzer, IndexReader)`
`int`	`addStopWords(IndexReader reader, float maxPercentDocs)` Deprecated. Stowords should be calculated at instantiation using `QueryAutoStopWordAnalyzer(Version, Analyzer, IndexReader, float)`
`int`	`addStopWords(IndexReader reader, int maxDocFreq)` Deprecated. Stopwords should be calculated at instantiation using `QueryAutoStopWordAnalyzer(Version, Analyzer, IndexReader, int)`
`int`	`addStopWords(IndexReader reader, String fieldName, float maxPercentDocs)` Deprecated. Stowords should be calculated at instantiation using `QueryAutoStopWordAnalyzer(Version, Analyzer, IndexReader, Collection, float)`
`int`	`addStopWords(IndexReader reader, String fieldName, int maxDocFreq)` Deprecated. Stowords should be calculated at instantiation using `QueryAutoStopWordAnalyzer(Version, Analyzer, IndexReader, Collection, int)`
`Term[]`	`getStopWords()` Provides information on which stop words have been identified for all fields
`String[]`	`getStopWords(String fieldName)` Provides information on which stop words have been identified for a field
`TokenStream`	`reusableTokenStream(String fieldName, Reader reader)` Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method.
`TokenStream`	`tokenStream(String fieldName, Reader reader)` Creates a TokenStream which tokenizes all the text in the provided Reader.

Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - defaultMaxDocFreqPercent
```
public static final float defaultMaxDocFreqPercent
```
    See Also:
    Constant Field Values
- Constructor Detail
  - QueryAutoStopWordAnalyzer
```
@Deprecated
public QueryAutoStopWordAnalyzer(Version matchVersion,
                                    Analyzer delegate)
```
    Deprecated. Stopwords should be calculated at instantiation using one of the other constructors
    
    Initializes this analyzer with the Analyzer object that actually produces the tokens
    
    Parameters:
    delegate - The choice of Analyzer that is used to produce the token stream which needs filtering
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Version matchVersion,
                         Analyzer delegate,
                         IndexReader indexReader)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than defaultMaxDocFreqPercent
    
    Parameters:
    matchVersion - Version to be used in StopFilter
    delegate - Analyzer whose TokenStream will be filtered
    indexReader - IndexReader to identify the stopwords from
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Version matchVersion,
                         Analyzer delegate,
                         IndexReader indexReader,
                         int maxDocFreq)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq
    
    Parameters:
    matchVersion - Version to be used in StopFilter
    delegate - Analyzer whose TokenStream will be filtered
    indexReader - IndexReader to identify the stopwords from
    maxDocFreq - Document frequency terms should be above in order to be stopwords
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Version matchVersion,
                         Analyzer delegate,
                         IndexReader indexReader,
                         float maxPercentDocs)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs
    
    Parameters:
    matchVersion - Version to be used in StopFilter
    delegate - Analyzer whose TokenStream will be filtered
    indexReader - IndexReader to identify the stopwords from
    maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Version matchVersion,
                         Analyzer delegate,
                         IndexReader indexReader,
                         Collection<String> fields,
                         float maxPercentDocs)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs
    
    Parameters:
    matchVersion - Version to be used in StopFilter
    delegate - Analyzer whose TokenStream will be filtered
    indexReader - IndexReader to identify the stopwords from
    fields - Selection of fields to calculate stopwords for
    maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
  - QueryAutoStopWordAnalyzer
```
public QueryAutoStopWordAnalyzer(Version matchVersion,
                         Analyzer delegate,
                         IndexReader indexReader,
                         Collection<String> fields,
                         int maxDocFreq)
                          throws IOException
```
    Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
    
    Parameters:
    matchVersion - Version to be used in StopFilter
    delegate - Analyzer whose TokenStream will be filtered
    indexReader - IndexReader to identify the stopwords from
    fields - Selection of fields to calculate stopwords for
    maxDocFreq - Document frequency terms should be above in order to be stopwords
    
    Throws:
    
    IOException - Can be thrown while reading from the IndexReader
- Method Detail
  - addStopWords
```
@Deprecated
public int addStopWords(IndexReader reader)
                 throws IOException
```
    Deprecated. Stopwords should be calculated at instantiation using QueryAutoStopWordAnalyzer(Version, Analyzer, IndexReader)
    
    Automatically adds stop words for all fields with terms exceeding the defaultMaxDocFreqPercent
    
    Parameters:
    reader - The IndexReader which will be consulted to identify potential stop words that exceed the required document frequency
    
    Returns:
    The number of stop words identified.
    
    Throws:
    
    IOException
  - addStopWords
```
@Deprecated
public int addStopWords(IndexReader reader,
                          int maxDocFreq)
                 throws IOException
```
    Deprecated. Stopwords should be calculated at instantiation using QueryAutoStopWordAnalyzer(Version, Analyzer, IndexReader, int)
    
    Automatically adds stop words for all fields with terms exceeding the maxDocFreqPercent
    
    Parameters:
    reader - The IndexReader which will be consulted to identify potential stop words that exceed the required document frequency
    maxDocFreq - The maximum number of index documents which can contain a term, after which the term is considered to be a stop word
    
    Returns:
    The number of stop words identified.
    
    Throws:
    
    IOException
  - addStopWords
```
@Deprecated
public int addStopWords(IndexReader reader,
                          float maxPercentDocs)
                 throws IOException
```
    Deprecated. Stowords should be calculated at instantiation using QueryAutoStopWordAnalyzer(Version, Analyzer, IndexReader, float)
    
    Automatically adds stop words for all fields with terms exceeding the maxDocFreqPercent
    
    Parameters:
    reader - The IndexReader which will be consulted to identify potential stop words that exceed the required document frequency
    maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word.
    
    Returns:
    The number of stop words identified.
    
    Throws:
    
    IOException
  - addStopWords
```
@Deprecated
public int addStopWords(IndexReader reader,
                          String fieldName,
                          float maxPercentDocs)
                 throws IOException
```
    Deprecated. Stowords should be calculated at instantiation using QueryAutoStopWordAnalyzer(Version, Analyzer, IndexReader, Collection, float)
    
    Automatically adds stop words for the given field with terms exceeding the maxPercentDocs
    
    Parameters:
    reader - The IndexReader which will be consulted to identify potential stop words that exceed the required document frequency
    fieldName - The field for which stopwords will be added
    maxPercentDocs - The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word.
    
    Returns:
    The number of stop words identified.
    
    Throws:
    
    IOException
  - addStopWords
```
@Deprecated
public int addStopWords(IndexReader reader,
                          String fieldName,
                          int maxDocFreq)
                 throws IOException
```
    Deprecated. Stowords should be calculated at instantiation using QueryAutoStopWordAnalyzer(Version, Analyzer, IndexReader, Collection, int)
    
    Automatically adds stop words for the given field with terms exceeding the maxPercentDocs
    
    Parameters:
    reader - The IndexReader which will be consulted to identify potential stop words that exceed the required document frequency
    fieldName - The field for which stopwords will be added
    maxDocFreq - The maximum number of index documents which can contain a term, after which the term is considered to be a stop word.
    
    Returns:
    The number of stop words identified.
    
    Throws:
    
    IOException
  - tokenStream
```
public TokenStream tokenStream(String fieldName,
                      Reader reader)
```
    Description copied from class: Analyzer
    
    Creates a TokenStream which tokenizes all the text in the provided Reader. Must be able to handle null field name for backward compatibility.
    
    Specified by:
    
    tokenStream in class Analyzer
  - reusableTokenStream
```
public TokenStream reusableTokenStream(String fieldName,
                              Reader reader)
                                throws IOException
```
    Description copied from class: Analyzer
    
    Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method. Callers that do not need to use more than one TokenStream at the same time from this analyzer should use this method for better performance.
    
    Overrides:
    
    reusableTokenStream in class Analyzer
    
    Throws:
    
    IOException
  - getStopWords
```
public String[] getStopWords(String fieldName)
```
    Provides information on which stop words have been identified for a field
    
    Parameters:
    fieldName - The field for which stop words identified in "addStopWords" method calls will be returned
    
    Returns:
    the stop words identified for a field
  - getStopWords
```
public Term[] getStopWords()
```
    Provides information on which stop words have been identified for all fields
    
    Returns:
    the stop words (as terms)

Class QueryAutoStopWordAnalyzer

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Field Detail

defaultMaxDocFreqPercent

Constructor Detail

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

QueryAutoStopWordAnalyzer

Method Detail

addStopWords

addStopWords

addStopWords

addStopWords

addStopWords

tokenStream

reusableTokenStream

getStopWords

getStopWords