public abstract class AbstractWordsFileFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
Concrete implementations can leverage the following input attributes. All attributes are optional:
ignoreCase
defaults to false
words
should be the name of a stopwords file to parse, if not specified the
factory will use the value provided by createDefaultWords()
implementation in
concrete subclass.
format
defines how the words
file will be parsed, and defaults to
wordset
. If words
is not specified, then format
must
not be specified.
The valid values for the format
option are:
wordset
- This is the default format, which supports one word per line
(including any intra-word whitespace) and allows whole line comments beginning with the "#"
character. Blank lines are ignored. See WordlistLoader.getLines
for details.
snowball
- This format allows for multiple words specified on each line, and
trailing comments may be specified using the vertical line ("|"). Blank lines are
ignored. See WordlistLoader.getSnowballWordSet
for details.
Modifier and Type | Field and Description |
---|---|
static String |
FORMAT_SNOWBALL |
static String |
FORMAT_WORDSET |
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor and Description |
---|
AbstractWordsFileFilterFactory(Map<String,String> args)
Initialize this factory via a set of key-value pairs.
|
Modifier and Type | Method and Description |
---|---|
protected abstract CharArraySet |
createDefaultWords()
Default word set implementation.
|
String |
getFormat() |
String |
getWordFiles() |
CharArraySet |
getWords() |
void |
inform(ResourceLoader loader)
Initialize the set of stopwords provided via ResourceLoader, or using defaults.
|
boolean |
isIgnoreCase() |
availableTokenFilters, create, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
public static final String FORMAT_WORDSET
public static final String FORMAT_SNOWBALL
public void inform(ResourceLoader loader) throws IOException
inform
in interface ResourceLoaderAware
IOException
protected abstract CharArraySet createDefaultWords()
public CharArraySet getWords()
public String getWordFiles()
public String getFormat()
public boolean isIgnoreCase()
Copyright © 2000-2024 Apache Software Foundation. All Rights Reserved.