org.apache.lucene.analysis.miscellaneous.ProtectedTermFilterFactory

All Implemented Interfaces:: ResourceLoaderAware

public class ProtectedTermFilterFactory extends ConditionalTokenFilterFactory

Factory for a ProtectedTermFilter

CustomAnalyzer example:

 Analyzer ana = CustomAnalyzer.builder()
   .withTokenizer("standard")
   .when("protectedterm", "ignoreCase", "true", "protected", "protectedTerms.txt")
     .addTokenFilter("truncate", "prefixLength", "4")
     .addTokenFilter("lowercase")
   .endwhen()
   .build();

Solr example, in which conditional filters are specified via the wrappedFilters parameter - a comma-separated list of case-insensitive TokenFilter SPI names - and conditional filter args are specified via filterName.argName parameters:

 <fieldType name="reverse_lower_with_exceptions" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.ProtectedTermFilterFactory" ignoreCase="true" protected="protectedTerms.txt"
             wrappedFilters="truncate,lowercase" truncate.prefixLength="4" />
   </analyzer>
 </fieldType>

When using the wrappedFilters parameter, each filter name must be unique, so if you need to specify the same filter more than once, you must add case-insensitive unique '-id' suffixes (note that the '-id' suffix is stripped prior to SPI lookup), e.g.:

 <fieldType name="double_synonym_with_exceptions" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.ProtectedTermFilterFactory" ignoreCase="true" protected="protectedTerms.txt"
             wrappedFilters="synonymgraph-A,synonymgraph-B"
             synonymgraph-A.synonyms="synonyms-1.txt"
             synonymgraph-B.synonyms="synonyms-2.txt"/>
   </analyzer>
 </fieldType>

Since:: 7.4.0
SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).: "protectedTerm"

Field Summary

Fields

Modifier and Type

Field

Description

static final char

FILTER_ARG_SEPARATOR

static final char

FILTER_NAME_ID_SEPARATOR

static final String

NAME

static final String

PROTECTED_TERMS

Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor Summary

Constructors

Constructor

Description

ProtectedTermFilterFactory()

Default ctor for compatibility with SPI

ProtectedTermFilterFactory(Map<String,String> args)
Method Summary

Modifier and Type

Method

Description

protected ConditionalTokenFilter

create(TokenStream input, Function<TokenStream,TokenStream> inner)

Modify the incoming TokenStream with a ConditionalTokenFilter

void

doInform(ResourceLoader loader)

Initialises this component with the corresponding ResourceLoader

CharArraySet

getProtectedTerms()

boolean

isIgnoreCase()

Methods inherited from class org.apache.lucene.analysis.miscellaneous.ConditionalTokenFilterFactory
create, inform, setInnerFilters

Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters

Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- NAME
  
  public static final String NAME
  See Also:
  
  Constant Field Values
- PROTECTED_TERMS
  
  public static final String PROTECTED_TERMS
  See Also:
  
  Constant Field Values
- FILTER_ARG_SEPARATOR
  
  public static final char FILTER_ARG_SEPARATOR
  See Also:
  
  Constant Field Values
- FILTER_NAME_ID_SEPARATOR
  
  public static final char FILTER_NAME_ID_SEPARATOR
  See Also:
  
  Constant Field Values
Constructor Details
- ProtectedTermFilterFactory
  
  public ProtectedTermFilterFactory(Map<String,String> args)
- ProtectedTermFilterFactory
  
  public ProtectedTermFilterFactory()
  
  Default ctor for compatibility with SPI
Method Details
- isIgnoreCase
  
  public boolean isIgnoreCase()
- getProtectedTerms
  
  public CharArraySet getProtectedTerms()
- create
  
  protected ConditionalTokenFilter create(TokenStream input, Function<TokenStream,TokenStream> inner)
  
  Description copied from class: ConditionalTokenFilterFactory
  
  Modify the incoming TokenStream with a ConditionalTokenFilter
  
  Specified by:
  
  create in class ConditionalTokenFilterFactory
- doInform
  
  public void doInform(ResourceLoader loader) throws IOException
  
  Description copied from class: ConditionalTokenFilterFactory
  
  Initialises this component with the corresponding ResourceLoader
  
  Overrides:
  
  doInform in class ConditionalTokenFilterFactory
  
  Throws:
  
  IOException

Class ProtectedTermFilterFactory

Field Summary

Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.miscellaneous.ConditionalTokenFilterFactory

Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory

Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

Methods inherited from class java.lang.Object

Field Details

NAME

PROTECTED_TERMS

FILTER_ARG_SEPARATOR

FILTER_NAME_ID_SEPARATOR

Constructor Details

ProtectedTermFilterFactory

ProtectedTermFilterFactory

Method Details

isIgnoreCase

getProtectedTerms

create

doInform