public class PatternTypingFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
<fieldType name="text_taf" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="com.example.PatternTypingFilter" patternFile="patterns.txt"/> <filter class="solr.TokenAnalyzerFilter" asType="text_en" preserveType="true"/> <filter class="solr.TypeAsSynonymFilterFactory" prefix="__TAS__" ignore="word,<ALPHANUM>,<NUM>,<SOUTHEAST_ASIAN>,<IDEOGRAPHIC>,<HIRAGANA>,<KATAKANA>,<HANGUL>,<EMOJI>"/> </analyzer> </fieldType>
Note that a configuration such as above may interfere with multi-word synonyms. The patterns file has the format:
(flags) (pattern) ::: (replacement)Therefore to set the first 2 flag bits on the original token matching 401k or 401(k) and adding a type of 'legal2_401_k' whenever either one is encountered one would use:
3 (\d+)\(?([a-z])\)? ::: legal2_$1_$2Note that the number indicating the flag bits to set must not have leading spaces and be followed by a single space, and must be 0 if no flags should be set. The flags number should not contain commas or a decimal point. Lines for which the first character is
#
will be ignored as comments. Does not support producing
a synonym textually identical to the original term.Modifier and Type | Field and Description |
---|---|
static String |
NAME
SPI name
|
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor and Description |
---|
PatternTypingFilterFactory(Map<String,String> args)
Creates a new PatternTypingFilterFactory
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
create(TokenStream input)
Transform the specified input TokenStream
|
void |
inform(ResourceLoader loader)
Initializes this component with the provided ResourceLoader
(used for loading classes, files, etc).
|
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
public static final String NAME
public void inform(ResourceLoader loader) throws IOException
ResourceLoaderAware
inform
in interface ResourceLoaderAware
IOException
public TokenStream create(TokenStream input)
TokenFilterFactory
create
in class TokenFilterFactory
Copyright © 2000-2024 Apache Software Foundation. All Rights Reserved.