org.apache.lucene.analysis.nl
Class DutchAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.nl.DutchAnalyzer
All Implemented Interfaces:
Closeable

public final class DutchAnalyzer
extends Analyzer

Analyzer for Dutch language.

Supports an external list of stopwords (words that will not be indexed at all), an external list of exclusions (word that will not be stemmed, but indexed) and an external list of word-stem pairs that overrule the algorithm (dictionary stemming). A default set of stopwords is used unless an alternative list is specified, but the exclusion list is empty by default.

You must specify the required Version compatibility when creating DutchAnalyzer:

NOTE: This class uses the same Version dependent settings as StandardAnalyzer.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
 
Field Summary
static String DEFAULT_STOPWORD_FILE
          File containing default Dutch stopwords.
 
Constructor Summary
DutchAnalyzer(Version matchVersion)
          Builds an analyzer with the default stop words (getDefaultStopSet()) and a few default entries for the stem exclusion table.
DutchAnalyzer(Version matchVersion, CharArraySet stopwords)
           
DutchAnalyzer(Version matchVersion, CharArraySet stopwords, CharArraySet stemExclusionTable)
           
DutchAnalyzer(Version matchVersion, CharArraySet stopwords, CharArraySet stemExclusionTable, CharArrayMap<String> stemOverrideDict)
           
 
Method Summary
protected  Analyzer.TokenStreamComponents createComponents(String fieldName, Reader aReader)
          Returns a (possibly reused) TokenStream which tokenizes all the text in the provided Reader.
static CharArraySet getDefaultStopSet()
          Returns an unmodifiable instance of the default stop-words set.
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, initReader, tokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_STOPWORD_FILE

public static final String DEFAULT_STOPWORD_FILE
File containing default Dutch stopwords.

See Also:
Constant Field Values
Constructor Detail

DutchAnalyzer

public DutchAnalyzer(Version matchVersion)
Builds an analyzer with the default stop words (getDefaultStopSet()) and a few default entries for the stem exclusion table.


DutchAnalyzer

public DutchAnalyzer(Version matchVersion,
                     CharArraySet stopwords)

DutchAnalyzer

public DutchAnalyzer(Version matchVersion,
                     CharArraySet stopwords,
                     CharArraySet stemExclusionTable)

DutchAnalyzer

public DutchAnalyzer(Version matchVersion,
                     CharArraySet stopwords,
                     CharArraySet stemExclusionTable,
                     CharArrayMap<String> stemOverrideDict)
Method Detail

getDefaultStopSet

public static CharArraySet getDefaultStopSet()
Returns an unmodifiable instance of the default stop-words set.

Returns:
an unmodifiable instance of the default stop-words set.

createComponents

protected Analyzer.TokenStreamComponents createComponents(String fieldName,
                                                          Reader aReader)
Returns a (possibly reused) TokenStream which tokenizes all the text in the provided Reader.

Specified by:
createComponents in class Analyzer
Returns:
A TokenStream built from a StandardTokenizer filtered with StandardFilter, LowerCaseFilter, StopFilter, KeywordMarkerFilter if a stem exclusion set is provided, StemmerOverrideFilter, and SnowballFilter


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.