StopAnalyzer (Lucene 3.4.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis
Class StopAnalyzer

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      org.apache.lucene.analysis.ReusableAnalyzerBase
          org.apache.lucene.analysis.StopwordAnalyzerBase
              org.apache.lucene.analysis.StopAnalyzer

All Implemented Interfaces:: Closeable

public final class StopAnalyzer
extends StopwordAnalyzerBase
extends StopwordAnalyzerBase

Filters LetterTokenizer with LowerCaseFilter and StopFilter.

You must specify the required Version compatibility when creating StopAnalyzer:

As of 3.1, StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords
As of 2.9, position increments are preserved

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
`ReusableAnalyzerBase.TokenStreamComponents`

Field Summary
`static Set<?>`	`ENGLISH_STOP_WORDS_SET` An unmodifiable set containing some common English words that are not usually useful for searching.

Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
`matchVersion, stopwords`

Constructor Summary
`StopAnalyzer(Version matchVersion)` Builds an analyzer which removes words in `ENGLISH_STOP_WORDS_SET`.
`StopAnalyzer(Version matchVersion, File stopwordsFile)` Builds an analyzer with the stop words from the given file.
`StopAnalyzer(Version matchVersion, Reader stopwords)` Builds an analyzer with the stop words from the given reader.
`StopAnalyzer(Version matchVersion, Set<?> stopWords)` Builds an analyzer with the stop words from the given set.

Method Summary
`protected ReusableAnalyzerBase.TokenStreamComponents`	`createComponents(String fieldName, Reader reader)` Creates `ReusableAnalyzerBase.TokenStreamComponents` used to tokenize all the text in the provided `Reader`.

Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
`getStopwordSet, loadStopwordSet`

Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
`initReader, reusableTokenStream, tokenStream`

Methods inherited from class org.apache.lucene.analysis.Analyzer
`close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

ENGLISH_STOP_WORDS_SET

public static final Set<?> ENGLISH_STOP_WORDS_SET

An unmodifiable set containing some common English words that are not usually useful for searching.

Constructor Detail

StopAnalyzer

public StopAnalyzer(Version matchVersion)

Builds an analyzer which removes words in ENGLISH_STOP_WORDS_SET.

Parameters:: matchVersion - See above

StopAnalyzer

public StopAnalyzer(Version matchVersion,
                    Set<?> stopWords)

Builds an analyzer with the stop words from the given set.

Parameters:: matchVersion - See above; stopWords - Set of stop words

StopAnalyzer

public StopAnalyzer(Version matchVersion,
                    File stopwordsFile)
             throws IOException

Builds an analyzer with the stop words from the given file.

Parameters:: matchVersion - See above; stopwordsFile - File to load stop words from
Throws:: IOException
See Also:: WordlistLoader.getWordSet(File)

StopAnalyzer

public StopAnalyzer(Version matchVersion,
                    Reader stopwords)
             throws IOException

Builds an analyzer with the stop words from the given reader.

Parameters:: matchVersion - See above; stopwords - Reader to load stop words from
Throws:: IOException
See Also:: WordlistLoader.getWordSet(Reader)

Method Detail

createComponents

protected ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName,
                                                                      Reader reader)

Creates ReusableAnalyzerBase.TokenStreamComponents used to tokenize all the text in the provided Reader.

Specified by:: createComponents in class ReusableAnalyzerBase

Parameters:: fieldName - the name of the fields content passed to the ReusableAnalyzerBase.TokenStreamComponents sink as a reader; reader - the reader passed to the Tokenizer constructor
Returns:: ReusableAnalyzerBase.TokenStreamComponents built from a LowerCaseTokenizer filtered with StopFilter

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis Class StopAnalyzer

ENGLISH_STOP_WORDS_SET

StopAnalyzer

StopAnalyzer

StopAnalyzer

StopAnalyzer

createComponents

org.apache.lucene.analysis
Class StopAnalyzer