Class AbstractWordsFileFilterFactory

  • All Implemented Interfaces:
    Direct Known Subclasses:
    CommonGramsFilterFactory, KeepWordFilterFactory, StopFilterFactory

    public abstract class AbstractWordsFileFilterFactory
    extends TokenFilterFactory
    implements ResourceLoaderAware
    Abstract parent class for analysis factories that accept a stopwords file as input.

    Concrete implementations can leverage the following input attributes. All attributes are optional:

    • ignoreCase defaults to false
    • words should be the name of a stopwords file to parse, if not specified the factory will use the value provided by createDefaultWords() implementation in concrete subclass.
    • format defines how the words file will be parsed, and defaults to wordset. If words is not specified, then format must not be specified.

    The valid values for the format option are:

    • wordset - This is the default format, which supports one word per line (including any intra-word whitespace) and allows whole line comments beginning with the "#" character. Blank lines are ignored. See WordlistLoader.getLines for details.
    • snowball - This format allows for multiple words specified on each line, and trailing comments may be specified using the vertical line ("|"). Blank lines are ignored. See WordlistLoader.getSnowballWordSet for details.
    • Constructor Detail

      • AbstractWordsFileFilterFactory

        protected AbstractWordsFileFilterFactory()
        Default ctor for compatibility with SPI
      • AbstractWordsFileFilterFactory

        public AbstractWordsFileFilterFactory​(Map<String,​String> args)
        Initialize this factory via a set of key-value pairs.
    • Method Detail

      • createDefaultWords

        protected abstract CharArraySet createDefaultWords()
        Default word set implementation.
      • getWordFiles

        public String getWordFiles()
      • getFormat

        public String getFormat()
      • isIgnoreCase

        public boolean isIgnoreCase()