Package org.apache.lucene.analysis.query
Class QueryAutoStopWordAnalyzer
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.AnalyzerWrapper
-
- org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class QueryAutoStopWordAnalyzer extends AnalyzerWrapper
AnAnalyzer
used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.
- Since:
- 3.1
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
-
Field Summary
Fields Modifier and Type Field Description static float
defaultMaxDocFreqPercent
-
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
-
Constructor Summary
Constructors Constructor Description QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater thandefaultMaxDocFreqPercent
QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, float maxPercentDocs)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocsQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, int maxDocFreq)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreqQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocsQueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq)
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Term[]
getStopWords()
Provides information on which stop words have been identified for all fieldsString[]
getStopWords(String fieldName)
Provides information on which stop words have been identified for a fieldprotected Analyzer
getWrappedAnalyzer(String fieldName)
protected Analyzer.TokenStreamComponents
wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)
-
Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper
attributeFactory, createComponents, getOffsetGap, getPositionIncrementGap, initReader, initReaderForNormalization, normalize, wrapReader, wrapReaderForNormalization, wrapTokenStreamForNormalization
-
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getReuseStrategy, normalize, tokenStream, tokenStream
-
-
-
-
Field Detail
-
defaultMaxDocFreqPercent
public static final float defaultMaxDocFreqPercent
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader) throws IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater thandefaultMaxDocFreqPercent
- Parameters:
delegate
- Analyzer whose TokenStream will be filteredindexReader
- IndexReader to identify the stopwords from- Throws:
IOException
- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, int maxDocFreq) throws IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency greater than the given maxDocFreq- Parameters:
delegate
- Analyzer whose TokenStream will be filteredindexReader
- IndexReader to identify the stopwords frommaxDocFreq
- Document frequency terms should be above in order to be stopwords- Throws:
IOException
- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, float maxPercentDocs) throws IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for all indexed fields from terms with a document frequency percentage greater than the given maxPercentDocs- Parameters:
delegate
- Analyzer whose TokenStream will be filteredindexReader
- IndexReader to identify the stopwords frommaxPercentDocs
- The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word- Throws:
IOException
- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, float maxPercentDocs) throws IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency percentage greater than the given maxPercentDocs- Parameters:
delegate
- Analyzer whose TokenStream will be filteredindexReader
- IndexReader to identify the stopwords fromfields
- Selection of fields to calculate stopwords formaxPercentDocs
- The maximum percentage (between 0.0 and 1.0) of index documents which contain a term, after which the word is considered to be a stop word- Throws:
IOException
- Can be thrown while reading from the IndexReader
-
QueryAutoStopWordAnalyzer
public QueryAutoStopWordAnalyzer(Analyzer delegate, IndexReader indexReader, Collection<String> fields, int maxDocFreq) throws IOException
Creates a new QueryAutoStopWordAnalyzer with stopwords calculated for the given selection of fields from terms with a document frequency greater than the given maxDocFreq- Parameters:
delegate
- Analyzer whose TokenStream will be filteredindexReader
- IndexReader to identify the stopwords fromfields
- Selection of fields to calculate stopwords formaxDocFreq
- Document frequency terms should be above in order to be stopwords- Throws:
IOException
- Can be thrown while reading from the IndexReader
-
-
Method Detail
-
getWrappedAnalyzer
protected Analyzer getWrappedAnalyzer(String fieldName)
- Specified by:
getWrappedAnalyzer
in classAnalyzerWrapper
-
wrapComponents
protected Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)
- Overrides:
wrapComponents
in classAnalyzerWrapper
-
getStopWords
public String[] getStopWords(String fieldName)
Provides information on which stop words have been identified for a field- Parameters:
fieldName
- The field for which stop words identified in "addStopWords" method calls will be returned- Returns:
- the stop words identified for a field
-
getStopWords
public Term[] getStopWords()
Provides information on which stop words have been identified for all fields- Returns:
- the stop words (as terms)
-
-