Package org.apache.lucene.analysis.ko
Analyzer for Korean.
-
Class Summary Class Description DecompoundToken A token that was generated from a compound.DictionaryToken A token stored in aDictionary
.GraphvizFormatter Outputs the dot (graphviz) string for the viterbi lattice.KoreanAnalyzer Analyzer for Korean that uses morphological analysis.KoreanNumberFilter ATokenFilter
that normalizes Korean numbers to regular Arabic decimal numbers in half-width characters.KoreanNumberFilter.NumberBuffer Buffer that holds a Korean number string and a position index used as a parsed-to markerKoreanNumberFilterFactory Factory forKoreanNumberFilter
.KoreanPartOfSpeechStopFilter Removes tokens that match a set of part-of-speech tags.KoreanPartOfSpeechStopFilterFactory Factory forKoreanPartOfSpeechStopFilter
.KoreanReadingFormFilter Replaces term text with theReadingAttribute
which is the Hangul transcription of Hanja characters.KoreanReadingFormFilterFactory Factory forKoreanReadingFormFilter
.KoreanTokenizer Tokenizer for Korean that uses morphological analysis.KoreanTokenizerFactory Factory forKoreanTokenizer
.POS Part of speech classification for Korean based on Sejong corpus classification.Token Analyzed token with morphological data. -
Enum Summary Enum Description KoreanTokenizer.DecompoundMode Decompound mode: this determines how the tokenizer handlesPOS.Type.COMPOUND
,POS.Type.INFLECT
andPOS.Type.PREANALYSIS
tokens.KoreanTokenizer.Type Token type reflecting the original source of this tokenPOS.Tag Part of speech tag for Korean based on Sejong corpus classification.POS.Type The type of the token.