Class KoreanTokenizerFactory

All Implemented Interfaces:
ResourceLoaderAware

public class KoreanTokenizerFactory extends TokenizerFactory implements ResourceLoaderAware
Factory for KoreanTokenizer.
 <fieldType name="text_ko" class="solr.TextField">
   <analyzer>
     <tokenizer class="solr.KoreanTokenizerFactory"
                decompoundMode="discard"
                userDictionary="user.txt"
                userDictionaryEncoding="UTF-8"
                outputUnknownUnigrams="false"
                discardPunctuation="true"
     />
  </analyzer>
 </fieldType>
 

Supports the following attributes:

  • userDictionary: User dictionary path.
  • userDictionaryEncoding: User dictionary encoding.
  • decompoundMode: Decompound mode. Either 'none', 'discard', 'mixed'. Default is discard. See KoreanTokenizer.DecompoundMode
  • outputUnknownUnigrams: If true outputs unigrams for unknown words.
  • discardPunctuation: true if punctuation tokens should be dropped from the output.
Since:
7.4.0
WARNING: This API is experimental and might change in incompatible ways in the next release.
SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
"korean"