StandardAnalyzer (Lucene 4.7.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.standard
Class StandardAnalyzer

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      org.apache.lucene.analysis.util.StopwordAnalyzerBase
          org.apache.lucene.analysis.standard.StandardAnalyzer

All Implemented Interfaces:: Closeable

public final class StandardAnalyzer
extends StopwordAnalyzerBase
extends StopwordAnalyzerBase

Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

You must specify the required Version compatibility when creating StandardAnalyzer:

As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
As of 3.1, StandardTokenizer implements Unicode text segmentation, and StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords. ClassicTokenizer and ClassicAnalyzer are the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer.
As of 2.9, StopFilter preserves position increments
As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
`Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents`

Field Summary
`static int`	`DEFAULT_MAX_TOKEN_LENGTH` Default maximum allowed token length
`static CharArraySet`	`STOP_WORDS_SET` An unmodifiable set containing some common English words that are usually not useful for searching.

Fields inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
`matchVersion, stopwords`

Fields inherited from class org.apache.lucene.analysis.Analyzer
`GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY`

Constructor Summary
`StandardAnalyzer(Version matchVersion)` Builds an analyzer with the default stop words (`STOP_WORDS_SET`).
`StandardAnalyzer(Version matchVersion, CharArraySet stopWords)` Builds an analyzer with the given stop words.
`StandardAnalyzer(Version matchVersion, Reader stopwords)` Builds an analyzer with the stop words from the given reader.

Method Summary
`protected Analyzer.TokenStreamComponents`	`createComponents(String fieldName, Reader reader)`
`int`	`getMaxTokenLength()`
`void`	`setMaxTokenLength(int length)` Set maximum allowed token length.

Methods inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
`getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet`

Methods inherited from class org.apache.lucene.analysis.Analyzer
`close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, tokenStream, tokenStream`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

DEFAULT_MAX_TOKEN_LENGTH

public static final int DEFAULT_MAX_TOKEN_LENGTH

Default maximum allowed token length

See Also:: Constant Field Values

STOP_WORDS_SET

public static final CharArraySet STOP_WORDS_SET

An unmodifiable set containing some common English words that are usually not useful for searching.

Constructor Detail

StandardAnalyzer

public StandardAnalyzer(Version matchVersion,
                        CharArraySet stopWords)

Builds an analyzer with the given stop words.

Parameters:: matchVersion - Lucene version to match See above; stopWords - stop words

StandardAnalyzer

public StandardAnalyzer(Version matchVersion)

Builds an analyzer with the default stop words (STOP_WORDS_SET).

Parameters:: matchVersion - Lucene version to match See above

StandardAnalyzer

public StandardAnalyzer(Version matchVersion,
                        Reader stopwords)
                 throws IOException

Builds an analyzer with the stop words from the given reader.

Parameters:: matchVersion - Lucene version to match See above; stopwords - Reader to read stop words from
Throws:: IOException
See Also:: WordlistLoader.getWordSet(Reader, Version)

Method Detail

setMaxTokenLength

public void setMaxTokenLength(int length)

Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.

getMaxTokenLength

public int getMaxTokenLength()

See Also:: setMaxTokenLength(int)

createComponents

protected Analyzer.TokenStreamComponents createComponents(String fieldName,
                                                          Reader reader)

Specified by:: createComponents in class Analyzer

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.standard Class StandardAnalyzer

DEFAULT_MAX_TOKEN_LENGTH

STOP_WORDS_SET

StandardAnalyzer

StandardAnalyzer

StandardAnalyzer

setMaxTokenLength

getMaxTokenLength

createComponents

org.apache.lucene.analysis.standard
Class StandardAnalyzer