Class StandardAnalyzer
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.StopwordAnalyzerBase
-
- org.apache.lucene.analysis.standard.StandardAnalyzer
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class StandardAnalyzer extends StopwordAnalyzerBase
FiltersStandardTokenizer
withLowerCaseFilter
andStopFilter
, using a configurable list of stop words.- Since:
- 3.1
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length-
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
stopwords
-
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
-
Constructor Summary
Constructors Constructor Description StandardAnalyzer()
Builds an analyzer with no stop words.StandardAnalyzer(Reader stopwords)
Builds an analyzer with the stop words from the given reader.StandardAnalyzer(CharArraySet stopWords)
Builds an analyzer with the given stop words.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Analyzer.TokenStreamComponents
createComponents(String fieldName)
Creates a newAnalyzer.TokenStreamComponents
instance for this analyzer.int
getMaxTokenLength()
Returns the current maximum token lengthprotected TokenStream
normalize(String fieldName, TokenStream in)
Wrap the givenTokenStream
in order to apply normalization filters.void
setMaxTokenLength(int length)
Set the max allowed token length.-
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
-
Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream
-
-
-
-
Field Detail
-
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length- See Also:
- Constant Field Values
-
-
Constructor Detail
-
StandardAnalyzer
public StandardAnalyzer(CharArraySet stopWords)
Builds an analyzer with the given stop words.- Parameters:
stopWords
- stop words
-
StandardAnalyzer
public StandardAnalyzer()
Builds an analyzer with no stop words.
-
StandardAnalyzer
public StandardAnalyzer(Reader stopwords) throws IOException
Builds an analyzer with the stop words from the given reader.- Parameters:
stopwords
- Reader to read stop words from- Throws:
IOException
- See Also:
WordlistLoader.getWordSet(Reader)
-
-
Method Detail
-
setMaxTokenLength
public void setMaxTokenLength(int length)
Set the max allowed token length. Tokens larger than this will be chopped up at this token length and emitted as multiple tokens. If you need to skip such large tokens, you could increase this max length, and then useLengthFilter
to remove long tokens. The default isDEFAULT_MAX_TOKEN_LENGTH
.
-
getMaxTokenLength
public int getMaxTokenLength()
Returns the current maximum token length- See Also:
setMaxTokenLength(int)
-
createComponents
protected Analyzer.TokenStreamComponents createComponents(String fieldName)
Description copied from class:Analyzer
Creates a newAnalyzer.TokenStreamComponents
instance for this analyzer.- Specified by:
createComponents
in classAnalyzer
- Parameters:
fieldName
- the name of the fields content passed to theAnalyzer.TokenStreamComponents
sink as a reader- Returns:
- the
Analyzer.TokenStreamComponents
for this analyzer.
-
normalize
protected TokenStream normalize(String fieldName, TokenStream in)
Description copied from class:Analyzer
Wrap the givenTokenStream
in order to apply normalization filters. The default implementation returns theTokenStream
as-is. This is used byAnalyzer.normalize(String, String)
.
-
-