org.apache.lucene.analysis.standard
Class ClassicAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.ReusableAnalyzerBase
          extended by org.apache.lucene.analysis.StopwordAnalyzerBase
              extended by org.apache.lucene.analysis.standard.ClassicAnalyzer
All Implemented Interfaces:
Closeable

public final class ClassicAnalyzer
extends StopwordAnalyzerBase

Filters ClassicTokenizer with ClassicFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

You must specify the required Version compatibility when creating ClassicAnalyzer:

ClassicAnalyzer was named StandardAnalyzer in Lucene versions prior to 3.1. As of 3.1, StandardAnalyzer implements Unicode text segmentation, as specified by UAX#29.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
ReusableAnalyzerBase.TokenStreamComponents
 
Field Summary
static int DEFAULT_MAX_TOKEN_LENGTH
          Default maximum allowed token length
static Set<?> STOP_WORDS_SET
          An unmodifiable set containing some common English words that are usually not useful for searching.
 
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
matchVersion, stopwords
 
Constructor Summary
ClassicAnalyzer(Version matchVersion)
          Builds an analyzer with the default stop words (STOP_WORDS_SET).
ClassicAnalyzer(Version matchVersion, File stopwords)
          Builds an analyzer with the stop words from the given file.
ClassicAnalyzer(Version matchVersion, Reader stopwords)
          Builds an analyzer with the stop words from the given reader.
ClassicAnalyzer(Version matchVersion, Set<?> stopWords)
          Builds an analyzer with the given stop words.
 
Method Summary
protected  ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName, Reader reader)
          Creates a new ReusableAnalyzerBase.TokenStreamComponents instance for this analyzer.
 int getMaxTokenLength()
           
 void setMaxTokenLength(int length)
          Set maximum allowed token length.
 
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet
 
Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
initReader, reusableTokenStream, tokenStream
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MAX_TOKEN_LENGTH

public static final int DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length

See Also:
Constant Field Values

STOP_WORDS_SET

public static final Set<?> STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.

Constructor Detail

ClassicAnalyzer

public ClassicAnalyzer(Version matchVersion,
                       Set<?> stopWords)
Builds an analyzer with the given stop words.

Parameters:
matchVersion - Lucene version to match See above
stopWords - stop words

ClassicAnalyzer

public ClassicAnalyzer(Version matchVersion)
Builds an analyzer with the default stop words (STOP_WORDS_SET).

Parameters:
matchVersion - Lucene version to match See above

ClassicAnalyzer

public ClassicAnalyzer(Version matchVersion,
                       File stopwords)
                throws IOException
Builds an analyzer with the stop words from the given file.

Parameters:
matchVersion - Lucene version to match See above
stopwords - File to read stop words from
Throws:
IOException
See Also:
WordlistLoader.getWordSet(File)

ClassicAnalyzer

public ClassicAnalyzer(Version matchVersion,
                       Reader stopwords)
                throws IOException
Builds an analyzer with the stop words from the given reader.

Parameters:
matchVersion - Lucene version to match See above
stopwords - Reader to read stop words from
Throws:
IOException
See Also:
WordlistLoader.getWordSet(Reader)
Method Detail

setMaxTokenLength

public void setMaxTokenLength(int length)
Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or reusableTokenStream is called.


getMaxTokenLength

public int getMaxTokenLength()
See Also:
setMaxTokenLength(int)

createComponents

protected ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName,
                                                                      Reader reader)
Description copied from class: ReusableAnalyzerBase
Creates a new ReusableAnalyzerBase.TokenStreamComponents instance for this analyzer.

Specified by:
createComponents in class ReusableAnalyzerBase
Parameters:
fieldName - the name of the fields content passed to the ReusableAnalyzerBase.TokenStreamComponents sink as a reader
reader - the reader passed to the Tokenizer constructor
Returns:
the ReusableAnalyzerBase.TokenStreamComponents for this analyzer.


Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.