Class AbstractWordsFileFilterFactory

All Implemented Interfaces:
Direct Known Subclasses:
CommonGramsFilterFactory, KeepWordFilterFactory, StopFilterFactory

public abstract class AbstractWordsFileFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
Abstract parent class for analysis factories that accept a stopwords file as input.

Concrete implementations can leverage the following input attributes. All attributes are optional:

  • ignoreCase defaults to false
  • words should be the name of a stopwords file to parse, if not specified the factory will use the value provided by createDefaultWords() implementation in concrete subclass.
  • format defines how the words file will be parsed, and defaults to wordset. If words is not specified, then format must not be specified.

The valid values for the format option are:

  • wordset - This is the default format, which supports one word per line (including any intra-word whitespace) and allows whole line comments beginning with the "#" character. Blank lines are ignored. See WordlistLoader.getLines for details.
  • snowball - This format allows for multiple words specified on each line, and trailing comments may be specified using the vertical line ("|"). Blank lines are ignored. See WordlistLoader.getSnowballWordSet for details.
  • Field Details

  • Constructor Details

    • AbstractWordsFileFilterFactory

      protected AbstractWordsFileFilterFactory()
      Default ctor for compatibility with SPI
    • AbstractWordsFileFilterFactory

      public AbstractWordsFileFilterFactory(Map<String,String> args)
      Initialize this factory via a set of key-value pairs.
  • Method Details

    • inform

      public void inform(ResourceLoader loader) throws IOException
      Initialize the set of stopwords provided via ResourceLoader, or using defaults.
      Specified by:
      inform in interface ResourceLoaderAware
    • createDefaultWords

      protected abstract CharArraySet createDefaultWords()
      Default word set implementation.
    • getWords

      public CharArraySet getWords()
    • getWordFiles

      public String getWordFiles()
    • getFormat

      public String getFormat()
    • isIgnoreCase

      public boolean isIgnoreCase()