Class ClassicAnalyzer
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.StopwordAnalyzerBase
-
- org.apache.lucene.analysis.classic.ClassicAnalyzer
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class ClassicAnalyzer extends StopwordAnalyzerBase
FiltersClassicTokenizer
withClassicFilter
,LowerCaseFilter
andStopFilter
, using a list of English stop words.ClassicAnalyzer was named StandardAnalyzer in Lucene versions prior to 3.1. As of 3.1,
StandardAnalyzer
implements Unicode text segmentation, as specified by UAX#29.- Since:
- 3.1
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token lengthstatic CharArraySet
STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.-
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
stopwords
-
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
-
Constructor Summary
Constructors Constructor Description ClassicAnalyzer()
Builds an analyzer with the default stop words (STOP_WORDS_SET
).ClassicAnalyzer(Reader stopwords)
Builds an analyzer with the stop words from the given reader.ClassicAnalyzer(CharArraySet stopWords)
Builds an analyzer with the given stop words.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Analyzer.TokenStreamComponents
createComponents(String fieldName)
int
getMaxTokenLength()
protected TokenStream
normalize(String fieldName, TokenStream in)
void
setMaxTokenLength(int length)
Set maximum allowed token length.-
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
-
Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream
-
-
-
-
Field Detail
-
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length- See Also:
- Constant Field Values
-
STOP_WORDS_SET
public static final CharArraySet STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.
-
-
Constructor Detail
-
ClassicAnalyzer
public ClassicAnalyzer(CharArraySet stopWords)
Builds an analyzer with the given stop words.- Parameters:
stopWords
- stop words
-
ClassicAnalyzer
public ClassicAnalyzer()
Builds an analyzer with the default stop words (STOP_WORDS_SET
).
-
ClassicAnalyzer
public ClassicAnalyzer(Reader stopwords) throws IOException
Builds an analyzer with the stop words from the given reader.- Parameters:
stopwords
- Reader to read stop words from- Throws:
IOException
- See Also:
WordlistLoader.getWordSet(Reader)
-
-
Method Detail
-
setMaxTokenLength
public void setMaxTokenLength(int length)
Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.
-
getMaxTokenLength
public int getMaxTokenLength()
- See Also:
setMaxTokenLength(int)
-
createComponents
protected Analyzer.TokenStreamComponents createComponents(String fieldName)
- Specified by:
createComponents
in classAnalyzer
-
normalize
protected TokenStream normalize(String fieldName, TokenStream in)
-
-