Class ProtectedTermFilterFactory
- java.lang.Object
-
- org.apache.lucene.analysis.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.TokenFilterFactory
-
- org.apache.lucene.analysis.miscellaneous.ConditionalTokenFilterFactory
-
- org.apache.lucene.analysis.miscellaneous.ProtectedTermFilterFactory
-
- All Implemented Interfaces:
ResourceLoaderAware
public class ProtectedTermFilterFactory extends ConditionalTokenFilterFactory
Factory for aProtectedTermFilter
CustomAnalyzer example:
Analyzer ana = CustomAnalyzer.builder() .withTokenizer("standard") .when("protectedterm", "ignoreCase", "true", "protected", "protectedTerms.txt") .addTokenFilter("truncate", "prefixLength", "4") .addTokenFilter("lowercase") .endwhen() .build();
Solr example, in which conditional filters are specified via the
wrappedFilters
parameter - a comma-separated list of case-insensitive TokenFilter SPI names - and conditional filter args are specified viafilterName.argName
parameters:<fieldType name="reverse_lower_with_exceptions" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ProtectedTermFilterFactory" ignoreCase="true" protected="protectedTerms.txt" wrappedFilters="truncate,lowercase" truncate.prefixLength="4" /> </analyzer> </fieldType>
When using the
wrappedFilters
parameter, each filter name must be unique, so if you need to specify the same filter more than once, you must add case-insensitive unique '-id' suffixes (note that the '-id' suffix is stripped prior to SPI lookup), e.g.:<fieldType name="double_synonym_with_exceptions" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ProtectedTermFilterFactory" ignoreCase="true" protected="protectedTerms.txt" wrappedFilters="synonymgraph-A,synonymgraph-B" synonymgraph-A.synonyms="synonyms-1.txt" synonymgraph-B.synonyms="synonyms-2.txt"/> </analyzer> </fieldType>
See related
CustomAnalyzer.Builder.whenTerm(Predicate)
- Since:
- 7.4.0
- SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
- "protectedTerm"
-
-
Field Summary
Fields Modifier and Type Field Description static char
FILTER_ARG_SEPARATOR
static char
FILTER_NAME_ID_SEPARATOR
static String
NAME
static String
PROTECTED_TERMS
-
Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
-
Constructor Summary
Constructors Constructor Description ProtectedTermFilterFactory()
Default ctor for compatibility with SPIProtectedTermFilterFactory(Map<String,String> args)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected ConditionalTokenFilter
create(TokenStream input, Function<TokenStream,TokenStream> inner)
Modify the incomingTokenStream
with aConditionalTokenFilter
void
doInform(ResourceLoader loader)
Initialises this component with the correspondingResourceLoader
CharArraySet
getProtectedTerms()
boolean
isIgnoreCase()
-
Methods inherited from class org.apache.lucene.analysis.miscellaneous.ConditionalTokenFilterFactory
create, inform, setInnerFilters
-
Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
-
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
NAME
public static final String NAME
- See Also:
- Constant Field Values
-
PROTECTED_TERMS
public static final String PROTECTED_TERMS
- See Also:
- Constant Field Values
-
FILTER_ARG_SEPARATOR
public static final char FILTER_ARG_SEPARATOR
- See Also:
- Constant Field Values
-
FILTER_NAME_ID_SEPARATOR
public static final char FILTER_NAME_ID_SEPARATOR
- See Also:
- Constant Field Values
-
-
Method Detail
-
isIgnoreCase
public boolean isIgnoreCase()
-
getProtectedTerms
public CharArraySet getProtectedTerms()
-
create
protected ConditionalTokenFilter create(TokenStream input, Function<TokenStream,TokenStream> inner)
Description copied from class:ConditionalTokenFilterFactory
Modify the incomingTokenStream
with aConditionalTokenFilter
- Specified by:
create
in classConditionalTokenFilterFactory
-
doInform
public void doInform(ResourceLoader loader) throws IOException
Description copied from class:ConditionalTokenFilterFactory
Initialises this component with the correspondingResourceLoader
- Overrides:
doInform
in classConditionalTokenFilterFactory
- Throws:
IOException
-
-