org.apache.lucene.analysis.standard.StandardAnalyzer

All Implemented Interfaces:: Closeable, AutoCloseable

public final class StandardAnalyzer extends StopwordAnalyzerBase

Filters StandardTokenizer with LowerCaseFilter and StopFilter, using a configurable list of stop words.

Since:: 3.1

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
Field Summary

Fields

Modifier and Type

Field

Description

static final int

DEFAULT_MAX_TOKEN_LENGTH

Default maximum allowed token length

Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
stopwords

Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
Constructor Summary

Constructors

Constructor

Description

StandardAnalyzer()

Builds an analyzer with no stop words.

StandardAnalyzer(Reader stopwords)

Builds an analyzer with the stop words from the given reader.

StandardAnalyzer(CharArraySet stopWords)

Builds an analyzer with the given stop words.
Method Summary

Modifier and Type

Method

Description

protected Analyzer.TokenStreamComponents

createComponents(String fieldName)

Creates a new Analyzer.TokenStreamComponents instance for this analyzer.

int

getMaxTokenLength()

Returns the current maximum token length

protected TokenStream

normalize(String fieldName, TokenStream in)

Wrap the given TokenStream in order to apply normalization filters.

void

setMaxTokenLength(int length)

Set the max allowed token length.

Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet

Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- DEFAULT_MAX_TOKEN_LENGTH
  
  public static final int DEFAULT_MAX_TOKEN_LENGTH
  
  Default maximum allowed token length
  See Also:
  
  Constant Field Values
Constructor Details
- StandardAnalyzer
  
  public StandardAnalyzer(CharArraySet stopWords)
  
  Builds an analyzer with the given stop words.
  
  Parameters:
  
  stopWords - stop words
- StandardAnalyzer
  
  public StandardAnalyzer()
  
  Builds an analyzer with no stop words.
- StandardAnalyzer
  
  public StandardAnalyzer(Reader stopwords) throws IOException
  
  Builds an analyzer with the stop words from the given reader.
  Parameters:
  
  stopwords - Reader to read stop words from
  
  Throws:
  
  IOException
  
  See Also:
  
  WordlistLoader.getWordSet(Reader)
Method Details
- setMaxTokenLength
  
  public void setMaxTokenLength(int length)
  
  Set the max allowed token length. Tokens larger than this will be chopped up at this token length and emitted as multiple tokens. If you need to skip such large tokens, you could increase this max length, and then use LengthFilter to remove long tokens. The default is DEFAULT_MAX_TOKEN_LENGTH.
- getMaxTokenLength
  
  public int getMaxTokenLength()
  
  Returns the current maximum token length
  See Also:
  
  setMaxTokenLength(int)
- createComponents
  
  protected Analyzer.TokenStreamComponents createComponents(String fieldName)
  
  Description copied from class: Analyzer
  
  Creates a new Analyzer.TokenStreamComponents instance for this analyzer.
  
  Specified by:
  
  createComponents in class Analyzer
  
  Parameters:
  
  fieldName - the name of the fields content passed to the Analyzer.TokenStreamComponents sink as a reader
  
  Returns:
  
  the Analyzer.TokenStreamComponents for this analyzer.
- normalize
  
  protected TokenStream normalize(String fieldName, TokenStream in)
  
  Description copied from class: Analyzer
  
  Wrap the given TokenStream in order to apply normalization filters. The default implementation returns the TokenStream as-is. This is used by Analyzer.normalize(String, String).
  
  Overrides:
  
  normalize in class Analyzer

Class StandardAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer

Field Summary

Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase

Fields inherited from class org.apache.lucene.analysis.Analyzer

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Field Details

DEFAULT_MAX_TOKEN_LENGTH

Constructor Details

StandardAnalyzer

StandardAnalyzer

StandardAnalyzer

Method Details

setMaxTokenLength

getMaxTokenLength

createComponents

normalize