org.apache.lucene.analysis.classic.ClassicAnalyzer

All Implemented Interfaces:: Closeable, AutoCloseable

public final class ClassicAnalyzer extends StopwordAnalyzerBase

Filters ClassicTokenizer with ClassicFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

ClassicAnalyzer was named StandardAnalyzer in Lucene versions prior to 3.1. As of 3.1, StandardAnalyzer implements Unicode text segmentation, as specified by UAX#29.

Since:: 3.1

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
Field Summary

Fields

Modifier and Type

Field

Description

static final int

DEFAULT_MAX_TOKEN_LENGTH

Default maximum allowed token length

static final CharArraySet

STOP_WORDS_SET

An unmodifiable set containing some common English words that are usually not useful for searching.

Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
stopwords

Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
Constructor Summary

Constructors

Constructor

Description

ClassicAnalyzer()

Builds an analyzer with the default stop words (STOP_WORDS_SET).

ClassicAnalyzer(Reader stopwords)

Builds an analyzer with the stop words from the given reader.

ClassicAnalyzer(CharArraySet stopWords)

Builds an analyzer with the given stop words.
Method Summary

Modifier and Type

Method

Description

protected Analyzer.TokenStreamComponents

createComponents(String fieldName)

int

getMaxTokenLength()

protected TokenStream

normalize(String fieldName, TokenStream in)

void

setMaxTokenLength(int length)

Set maximum allowed token length.

Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet

Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- DEFAULT_MAX_TOKEN_LENGTH
  
  public static final int DEFAULT_MAX_TOKEN_LENGTH
  
  Default maximum allowed token length
  See Also:
  
  Constant Field Values
- STOP_WORDS_SET
  
  public static final CharArraySet STOP_WORDS_SET
  
  An unmodifiable set containing some common English words that are usually not useful for searching.
Constructor Details
- ClassicAnalyzer
  
  public ClassicAnalyzer(CharArraySet stopWords)
  
  Builds an analyzer with the given stop words.
  
  Parameters:
  
  stopWords - stop words
- ClassicAnalyzer
  
  public ClassicAnalyzer()
  
  Builds an analyzer with the default stop words (STOP_WORDS_SET).
- ClassicAnalyzer
  
  public ClassicAnalyzer(Reader stopwords) throws IOException
  
  Builds an analyzer with the stop words from the given reader.
  Parameters:
  
  stopwords - Reader to read stop words from
  
  Throws:
  
  IOException
  
  See Also:
  
  WordlistLoader.getWordSet(Reader)
Method Details
- setMaxTokenLength
  
  public void setMaxTokenLength(int length)
  
  Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.
- getMaxTokenLength
  
  public int getMaxTokenLength()
  See Also:
  
  setMaxTokenLength(int)
- createComponents
  
  protected Analyzer.TokenStreamComponents createComponents(String fieldName)
  
  Specified by:
  
  createComponents in class Analyzer
- normalize
  
  protected TokenStream normalize(String fieldName, TokenStream in)
  
  Overrides:
  
  normalize in class Analyzer

Class ClassicAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer

Field Summary

Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase

Fields inherited from class org.apache.lucene.analysis.Analyzer

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Field Details

DEFAULT_MAX_TOKEN_LENGTH

STOP_WORDS_SET

Constructor Details

ClassicAnalyzer

ClassicAnalyzer

ClassicAnalyzer

Method Details

setMaxTokenLength

getMaxTokenLength

createComponents

normalize