StandardAnalyzer (Lucene 3.6.2 API)

java.lang.Object
- org.apache.lucene.analysis.Analyzer
- - org.apache.lucene.analysis.ReusableAnalyzerBase
  - - org.apache.lucene.analysis.StopwordAnalyzerBase
    - - org.apache.lucene.analysis.standard.StandardAnalyzer

All Implemented Interfaces:

Closeable
```
public final class StandardAnalyzer
extends StopwordAnalyzerBase
```
Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.
You must specify the required Version compatibility when creating StandardAnalyzer:
- As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
- As of 3.1, StandardTokenizer implements Unicode text segmentation, and StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords. ClassicTokenizer and ClassicAnalyzer are the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer.
- As of 2.9, StopFilter preserves position increments
- As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
  ReusableAnalyzerBase.TokenStreamComponents

Field Summary

Fields
Modifier and Type	Field and Description
`static int`	`DEFAULT_MAX_TOKEN_LENGTH` Default maximum allowed token length
`static Set<?>`	`STOP_WORDS_SET` An unmodifiable set containing some common English words that are usually not useful for searching.

Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
matchVersion, stopwords

Constructor Summary

Constructors
Constructor and Description
`StandardAnalyzer(Version matchVersion)` Builds an analyzer with the default stop words (`STOP_WORDS_SET`).
`StandardAnalyzer(Version matchVersion, File stopwords)` Deprecated. Use `StandardAnalyzer(Version, Reader)` instead.
`StandardAnalyzer(Version matchVersion, Reader stopwords)` Builds an analyzer with the stop words from the given reader.
`StandardAnalyzer(Version matchVersion, Set<?> stopWords)` Builds an analyzer with the given stop words.

Method Summary

Methods
Modifier and Type	Method and Description
`protected ReusableAnalyzerBase.TokenStreamComponents`	`createComponents(String fieldName, Reader reader)` Creates a new `ReusableAnalyzerBase.TokenStreamComponents` instance for this analyzer.
`int`	`getMaxTokenLength()`
`void`	`setMaxTokenLength(int length)` Set maximum allowed token length.

Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet

Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
initReader, reusableTokenStream, tokenStream

Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_MAX_TOKEN_LENGTH
```
public static final int DEFAULT_MAX_TOKEN_LENGTH
```
    Default maximum allowed token length
    
    See Also:
    Constant Field Values
  - STOP_WORDS_SET
```
public static final Set<?> STOP_WORDS_SET
```
    An unmodifiable set containing some common English words that are usually not useful for searching.
- Constructor Detail
  - StandardAnalyzer
```
public StandardAnalyzer(Version matchVersion,
                Set<?> stopWords)
```
    Builds an analyzer with the given stop words.
    
    Parameters:
    matchVersion - Lucene version to match See above
    stopWords - stop words
  - StandardAnalyzer
```
public StandardAnalyzer(Version matchVersion)
```
    Builds an analyzer with the default stop words (STOP_WORDS_SET).
    
    Parameters:
    matchVersion - Lucene version to match See above
  - StandardAnalyzer
```
@Deprecated
public StandardAnalyzer(Version matchVersion,
                           File stopwords)
                 throws IOException
```
    Deprecated. Use StandardAnalyzer(Version, Reader) instead.
    
    Builds an analyzer with the stop words from the given file.
    
    Parameters:
    matchVersion - Lucene version to match See above
    stopwords - File to read stop words from
    
    Throws:
    
    IOException
    See Also:
    WordlistLoader.getWordSet(Reader, Version)
  - StandardAnalyzer
```
public StandardAnalyzer(Version matchVersion,
                Reader stopwords)
                 throws IOException
```
    Builds an analyzer with the stop words from the given reader.
    
    Parameters:
    matchVersion - Lucene version to match See above
    stopwords - Reader to read stop words from
    
    Throws:
    
    IOException
    See Also:
    WordlistLoader.getWordSet(Reader, Version)
- Method Detail
  - setMaxTokenLength
```
public void setMaxTokenLength(int length)
```
    Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or reusableTokenStream is called.
  - getMaxTokenLength
```
public int getMaxTokenLength()
```
    See Also:
    setMaxTokenLength(int)
  - createComponents
```
protected ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName,
                                                          Reader reader)
```
    Description copied from class: ReusableAnalyzerBase
    
    Creates a new ReusableAnalyzerBase.TokenStreamComponents instance for this analyzer.
    
    Specified by:
    
    createComponents in class ReusableAnalyzerBase
    
    Parameters:
    fieldName - the name of the fields content passed to the ReusableAnalyzerBase.TokenStreamComponents sink as a reader
    reader - the reader passed to the Tokenizer constructor
    
    Returns:
    the ReusableAnalyzerBase.TokenStreamComponents for this analyzer.

Class StandardAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase

Field Summary

Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase

Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_MAX_TOKEN_LENGTH

STOP_WORDS_SET

Constructor Detail

StandardAnalyzer

StandardAnalyzer

StandardAnalyzer

StandardAnalyzer

Method Detail

setMaxTokenLength

getMaxTokenLength

createComponents