QueryAutoStopWordAnalyzer (Lucene 3.1.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.query
Class QueryAutoStopWordAnalyzer

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      org.apache.lucene.analysis.query.QueryAutoStopWordAnalyzer

All Implemented Interfaces:: Closeable

public final class QueryAutoStopWordAnalyzer
extends org.apache.lucene.analysis.Analyzer
extends org.apache.lucene.analysis.Analyzer

An Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.

For very large indexes the cost of reading TermDocs for a very common word can be high. This analyzer was created after experience with a 38 million doc index which had a term in around 50% of docs and was causing TermQueries for this term to take 2 seconds.

Use the various "addStopWords" methods in this class to automate the identification and addition of stop words found in an already existing index.

Field Summary
`static float`	`defaultMaxDocFreqPercent`

Constructor Summary
`QueryAutoStopWordAnalyzer(org.apache.lucene.util.Version matchVersion, org.apache.lucene.analysis.Analyzer delegate)` Initializes this analyzer with the Analyzer object that actually produces the tokens

Method Summary
`int`	`addStopWords(org.apache.lucene.index.IndexReader reader)` Automatically adds stop words for all fields with terms exceeding the defaultMaxDocFreqPercent
`int`	`addStopWords(org.apache.lucene.index.IndexReader reader, float maxPercentDocs)` Automatically adds stop words for all fields with terms exceeding the maxDocFreqPercent
`int`	`addStopWords(org.apache.lucene.index.IndexReader reader, int maxDocFreq)` Automatically adds stop words for all fields with terms exceeding the maxDocFreqPercent
`int`	`addStopWords(org.apache.lucene.index.IndexReader reader, String fieldName, float maxPercentDocs)` Automatically adds stop words for the given field with terms exceeding the maxPercentDocs
`int`	`addStopWords(org.apache.lucene.index.IndexReader reader, String fieldName, int maxDocFreq)` Automatically adds stop words for the given field with terms exceeding the maxPercentDocs
`org.apache.lucene.index.Term[]`	`getStopWords()` Provides information on which stop words have been identified for all fields
`String[]`	`getStopWords(String fieldName)` Provides information on which stop words have been identified for a field
`org.apache.lucene.analysis.TokenStream`	`reusableTokenStream(String fieldName, Reader reader)`
`org.apache.lucene.analysis.TokenStream`	`tokenStream(String fieldName, Reader reader)`

Methods inherited from class org.apache.lucene.analysis.Analyzer
`close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

defaultMaxDocFreqPercent

public static final float defaultMaxDocFreqPercent

See Also:: Constant Field Values

Constructor Detail