Class PatternTypingFilterFactory

  • All Implemented Interfaces:
    ResourceLoaderAware

    public class PatternTypingFilterFactory
    extends TokenFilterFactory
    implements ResourceLoaderAware
    Provides a filter that will analyze tokens with the analyzer from an arbitrary field type. By itself this filter is not very useful. Normally it is combined with a filter that reacts to types or flags.
     <fieldType name="text_taf" class="solr.TextField" positionIncrementGap="100">
       <analyzer>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="com.example.PatternTypingFilter" patternFile="patterns.txt"/>
         <filter class="solr.TokenAnalyzerFilter" asType="text_en" preserveType="true"/>
         <filter class="solr.TypeAsSynonymFilterFactory" prefix="__TAS__"
                   ignore="word,&lt;ALPHANUM&gt;,&lt;NUM&gt;,&lt;SOUTHEAST_ASIAN&gt;,&lt;IDEOGRAPHIC&gt;,&lt;HIRAGANA&gt;,&lt;KATAKANA&gt;,&lt;HANGUL&gt;,&lt;EMOJI&gt;"/>
       </analyzer>
     </fieldType>

    Note that a configuration such as above may interfere with multi-word synonyms. The patterns file has the format:

     (flags) (pattern) ::: (replacement)
     
    Therefore to set the first 2 flag bits on the original token matching 401k or 401(k) and adding a type of 'legal2_401_k' whenever either one is encountered one would use:
     3 (\d+)\(?([a-z])\)? ::: legal2_$1_$2
     
    Note that the number indicating the flag bits to set must not have leading spaces and be followed by a single space, and must be 0 if no flags should be set. The flags number should not contain commas or a decimal point. Lines for which the first character is # will be ignored as comments. Does not support producing a synonym textually identical to the original term.
    Since:
    8.8
    SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
    "patternTyping"