org.apache.lucene.analysis.standard
Class ClassicAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.util.StopwordAnalyzerBase
org.apache.lucene.analysis.standard.ClassicAnalyzer
- All Implemented Interfaces:
- Closeable
public final class ClassicAnalyzer
- extends StopwordAnalyzerBase
Filters ClassicTokenizer
with ClassicFilter
, LowerCaseFilter
and StopFilter
, using a list of
English stop words.
You must specify the required Version
compatibility when creating ClassicAnalyzer:
- As of 3.1, StopFilter correctly handles Unicode 4.0
supplementary characters in stopwords
- As of 2.9, StopFilter preserves position
increments
- As of 2.4, Tokens incorrectly identified as acronyms
are corrected (see LUCENE-1068)
ClassicAnalyzer was named StandardAnalyzer in Lucene versions prior to 3.1.
As of 3.1, StandardAnalyzer
implements Unicode text segmentation,
as specified by UAX#29.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
- Default maximum allowed token length
- See Also:
- Constant Field Values
STOP_WORDS_SET
public static final CharArraySet STOP_WORDS_SET
- An unmodifiable set containing some common English words that are usually not
useful for searching.
ClassicAnalyzer
public ClassicAnalyzer(Version matchVersion,
CharArraySet stopWords)
- Builds an analyzer with the given stop words.
- Parameters:
matchVersion
- Lucene version to match See abovestopWords
- stop words
ClassicAnalyzer
public ClassicAnalyzer(Version matchVersion)
- Builds an analyzer with the default stop words (
STOP_WORDS_SET
).
- Parameters:
matchVersion
- Lucene version to match See above
ClassicAnalyzer
public ClassicAnalyzer(Version matchVersion,
Reader stopwords)
throws IOException
- Builds an analyzer with the stop words from the given reader.
- Parameters:
matchVersion
- Lucene version to match See abovestopwords
- Reader to read stop words from
- Throws:
IOException
- See Also:
WordlistLoader.getWordSet(Reader, Version)
setMaxTokenLength
public void setMaxTokenLength(int length)
- Set maximum allowed token length. If a token is seen
that exceeds this length then it is discarded. This
setting only takes effect the next time tokenStream or
tokenStream is called.
getMaxTokenLength
public int getMaxTokenLength()
- See Also:
setMaxTokenLength(int)
createComponents
protected Analyzer.TokenStreamComponents createComponents(String fieldName,
Reader reader)
- Specified by:
createComponents
in class Analyzer
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.