StandardAnalyzer (Lucene 4.7.0 API)

java.lang.Object
- org.apache.lucene.analysis.Analyzer
- - org.apache.lucene.analysis.util.StopwordAnalyzerBase
  - - org.apache.lucene.analysis.standard.StandardAnalyzer

All Implemented Interfaces:

Closeable
```
public final class StandardAnalyzer
extends StopwordAnalyzerBase
```
Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.
You must specify the required Version compatibility when creating StandardAnalyzer:
- As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
- As of 3.1, StandardTokenizer implements Unicode text segmentation, and StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords. ClassicTokenizer and ClassicAnalyzer are the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer.
- As of 2.9, StopFilter preserves position increments
- As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
  Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents

Field Summary

Fields
Modifier and Type	Field and Description
`static int`	`DEFAULT_MAX_TOKEN_LENGTH` Default maximum allowed token length
`static CharArraySet`	`STOP_WORDS_SET` An unmodifiable set containing some common English words that are usually not useful for searching.

Fields inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
matchVersion, stopwords

Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY

Constructor Summary

Constructors
Constructor and Description
`StandardAnalyzer(Version matchVersion)` Builds an analyzer with the default stop words (`STOP_WORDS_SET`).
`StandardAnalyzer(Version matchVersion, CharArraySet stopWords)` Builds an analyzer with the given stop words.
`StandardAnalyzer(Version matchVersion, Reader stopwords)` Builds an analyzer with the stop words from the given reader.

Method Summary

Methods
Modifier and Type	Method and Description
`protected Analyzer.TokenStreamComponents`	`createComponents(String fieldName, Reader reader)`
`int`	`getMaxTokenLength()`
`void`	`setMaxTokenLength(int length)` Set maximum allowed token length.

Methods inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet

Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, tokenStream, tokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_MAX_TOKEN_LENGTH
```
public static final int DEFAULT_MAX_TOKEN_LENGTH
```
    Default maximum allowed token length
    
    See Also:
    Constant Field Values
  - STOP_WORDS_SET
```
public static final CharArraySet STOP_WORDS_SET
```
    An unmodifiable set containing some common English words that are usually not useful for searching.
- Constructor Detail
  - StandardAnalyzer
```
public StandardAnalyzer(Version matchVersion,
                CharArraySet stopWords)
```
    Builds an analyzer with the given stop words.
    
    Parameters:
    matchVersion - Lucene version to match See above
    stopWords - stop words
  - StandardAnalyzer
```
public StandardAnalyzer(Version matchVersion)
```
    Builds an analyzer with the default stop words (STOP_WORDS_SET).
    
    Parameters:
    matchVersion - Lucene version to match See above
  - StandardAnalyzer
```
public StandardAnalyzer(Version matchVersion,
                Reader stopwords)
                 throws IOException
```
    Builds an analyzer with the stop words from the given reader.
    
    Parameters:
    matchVersion - Lucene version to match See above
    stopwords - Reader to read stop words from
    
    Throws:
    
    IOException
    See Also:
    WordlistLoader.getWordSet(Reader, Version)
- Method Detail
  - setMaxTokenLength
```
public void setMaxTokenLength(int length)
```
    Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.
  - getMaxTokenLength
```
public int getMaxTokenLength()
```
    See Also:
    setMaxTokenLength(int)
  - createComponents
```
protected Analyzer.TokenStreamComponents createComponents(String fieldName,
                                              Reader reader)
```
    Specified by:
    
    createComponents in class Analyzer

Class StandardAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer

Field Summary

Fields inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase

Fields inherited from class org.apache.lucene.analysis.Analyzer

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_MAX_TOKEN_LENGTH

STOP_WORDS_SET

Constructor Detail

StandardAnalyzer

StandardAnalyzer

StandardAnalyzer

Method Detail

setMaxTokenLength

getMaxTokenLength

createComponents