Class KoreanTokenizerFactory

  • All Implemented Interfaces:
    ResourceLoaderAware

    public class KoreanTokenizerFactory
    extends TokenizerFactory
    implements ResourceLoaderAware
    Factory for KoreanTokenizer.
     <fieldType name="text_ko" class="solr.TextField">
       <analyzer>
         <tokenizer class="solr.KoreanTokenizerFactory"
                    decompoundMode="discard"
                    userDictionary="user.txt"
                    userDictionaryEncoding="UTF-8"
                    outputUnknownUnigrams="false"
         />
      </analyzer>
     </fieldType>
     

    Supports the following attributes:

    • userDictionary: User dictionary path.
    • userDictionaryEncoding: User dictionary encoding.
    • decompoundMode: Decompound mode. Either 'none', 'discard', 'mixed'. Default is discard. See KoreanTokenizer.DecompoundMode
    • outputUnknownUnigrams: If true outputs unigrams for unknown words.
    Since:
    7.4.0
    WARNING: This API is experimental and might change in incompatible ways in the next release.