Package org.apache.lucene.analysis.de
Class GermanAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.StopwordAnalyzerBase
org.apache.lucene.analysis.de.GermanAnalyzer
- All Implemented Interfaces:
Closeable
,AutoCloseable
Analyzer
for German language.
Supports an external list of stopwords (words that will not be indexed at all) and an external list of exclusions (word that will not be stemmed, but indexed). A default set of stopwords is used unless an alternative list is specified, but the exclusion list is empty by default.
NOTE: This class uses the same Version
dependent
settings as StandardAnalyzer
.
NOTE: This class does not decompound nouns, additional data files are needed, incompatible with the Apache 2.0 License. You can find these data files and example code for decompounding here.
- Since:
- 3.1
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
Field Summary
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
stopwords
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
Constructor Summary
ConstructorDescriptionBuilds an analyzer with the default stop words:getDefaultStopSet()
.GermanAnalyzer
(CharArraySet stopwords) Builds an analyzer with the given stop wordsGermanAnalyzer
(CharArraySet stopwords, CharArraySet stemExclusionSet) Builds an analyzer with the given stop words -
Method Summary
Modifier and TypeMethodDescriptionprotected Analyzer.TokenStreamComponents
createComponents
(String fieldName) CreatesAnalyzer.TokenStreamComponents
used to tokenize all the text in the providedReader
.static final CharArraySet
Returns a set of default German-stopwordsprotected TokenStream
normalize
(String fieldName, TokenStream in) Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream
-
Field Details
-
DEFAULT_STOPWORD_FILE
File containing default German stopwords.- See Also:
-
-
Constructor Details
-
GermanAnalyzer
public GermanAnalyzer()Builds an analyzer with the default stop words:getDefaultStopSet()
. -
GermanAnalyzer
Builds an analyzer with the given stop words- Parameters:
stopwords
- a stopword set
-
GermanAnalyzer
Builds an analyzer with the given stop words- Parameters:
stopwords
- a stopword setstemExclusionSet
- a stemming exclusion set
-
-
Method Details
-
getDefaultStopSet
Returns a set of default German-stopwords- Returns:
- a set of default German-stopwords
-
createComponents
CreatesAnalyzer.TokenStreamComponents
used to tokenize all the text in the providedReader
.- Specified by:
createComponents
in classAnalyzer
- Returns:
Analyzer.TokenStreamComponents
built from aStandardTokenizer
filtered withLowerCaseFilter
,StopFilter
,SetKeywordMarkerFilter
if a stem exclusion set is provided,GermanNormalizationFilter
andGermanLightStemFilter
-
normalize
-