org.apache.lucene.analysis.en.AbstractWordsFileFilterFactory

All Implemented Interfaces:: ResourceLoaderAware

Direct Known Subclasses:: CommonGramsFilterFactory, KeepWordFilterFactory, StopFilterFactory

public abstract class AbstractWordsFileFilterFactory extends TokenFilterFactory implements ResourceLoaderAware

Abstract parent class for analysis factories that accept a stopwords file as input.

Concrete implementations can leverage the following input attributes. All attributes are optional:

ignoreCase defaults to false
words should be the name of a stopwords file to parse, if not specified the factory will use the value provided by createDefaultWords() implementation in concrete subclass.
format defines how the words file will be parsed, and defaults to wordset. If words is not specified, then format must not be specified.

The valid values for the format option are:

wordset - This is the default format, which supports one word per line (including any intra-word whitespace) and allows whole line comments beginning with the "#" character. Blank lines are ignored. See WordlistLoader.getLines for details.
snowball - This format allows for multiple words specified on each line, and trailing comments may be specified using the vertical line ("|"). Blank lines are ignored. See WordlistLoader.getSnowballWordSet for details.

Field Summary

Fields

Modifier and Type

Field

Description

static final String

FORMAT_SNOWBALL

static final String

FORMAT_WORDSET

Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor Summary

Constructors

Modifier

Constructor

Description

protected

AbstractWordsFileFilterFactory()

Default ctor for compatibility with SPI

AbstractWordsFileFilterFactory(Map<String,String> args)

Initialize this factory via a set of key-value pairs.
Method Summary

Modifier and Type

Method

Description

protected abstract CharArraySet

createDefaultWords()

Default word set implementation.

String

getFormat()

String

getWordFiles()

CharArraySet

getWords()

void

inform(ResourceLoader loader)

Initialize the set of stopwords provided via ResourceLoader, or using defaults.

boolean

isIgnoreCase()

Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, create, findSPIName, forName, lookupClass, normalize, reloadTokenFilters

Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- FORMAT_WORDSET
  
  public static final String FORMAT_WORDSET
  See Also:
  
  Constant Field Values
- FORMAT_SNOWBALL
  
  public static final String FORMAT_SNOWBALL
  See Also:
  
  Constant Field Values
Constructor Details
- AbstractWordsFileFilterFactory
  
  protected AbstractWordsFileFilterFactory()
  
  Default ctor for compatibility with SPI
- AbstractWordsFileFilterFactory
  
  public AbstractWordsFileFilterFactory(Map<String,String> args)
  
  Initialize this factory via a set of key-value pairs.
Method Details
- inform
  
  public void inform(ResourceLoader loader) throws IOException
  
  Initialize the set of stopwords provided via ResourceLoader, or using defaults.
  
  Specified by:
  
  inform in interface ResourceLoaderAware
  
  Throws:
  
  IOException
- createDefaultWords
  
  protected abstract CharArraySet createDefaultWords()
  
  Default word set implementation.
- getWords
  
  public CharArraySet getWords()
- getWordFiles
  
  public String getWordFiles()
- getFormat
  
  public String getFormat()
- isIgnoreCase
  
  public boolean isIgnoreCase()

Class AbstractWordsFileFilterFactory

Field Summary

Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory

Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

Methods inherited from class java.lang.Object

Field Details

FORMAT_WORDSET

FORMAT_SNOWBALL

Constructor Details

AbstractWordsFileFilterFactory

AbstractWordsFileFilterFactory

Method Details

inform

createDefaultWords

getWords

getWordFiles

getFormat

isIgnoreCase