Package org.apache.lucene.analysis.ja
Analyzer for Japanese.
-
Class Summary Class Description GraphvizFormatter Outputs the dot (graphviz) string for the viterbi lattice.JapaneseAnalyzer Analyzer for Japanese that uses morphological analysis.JapaneseBaseFormFilter Replaces term text with theBaseFormAttribute
.JapaneseBaseFormFilterFactory Factory forJapaneseBaseFormFilter
.JapaneseIterationMarkCharFilter Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.JapaneseIterationMarkCharFilterFactory Factory forJapaneseIterationMarkCharFilter
.JapaneseKatakanaStemFilter ATokenFilter
that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).JapaneseKatakanaStemFilterFactory Factory forJapaneseKatakanaStemFilter
.JapaneseNumberFilter ATokenFilter
that normalizes Japanese numbers (kansūji) to regular Arabic decimal numbers in half-width characters.JapaneseNumberFilter.NumberBuffer Buffer that holds a Japanese number string and a position index used as a parsed-to markerJapaneseNumberFilterFactory Factory forJapaneseNumberFilter
.JapanesePartOfSpeechStopFilter Removes tokens that match a set of part-of-speech tags.JapanesePartOfSpeechStopFilterFactory Factory forJapanesePartOfSpeechStopFilter
.JapaneseReadingFormFilter ATokenFilter
that replaces the term attribute with the reading of a token in either katakana or romaji form.JapaneseReadingFormFilterFactory Factory forJapaneseReadingFormFilter
.JapaneseTokenizer Tokenizer for Japanese that uses morphological analysis.JapaneseTokenizerFactory Factory forJapaneseTokenizer
.Token Analyzed token with morphological data from its dictionary. -
Enum Summary Enum Description JapaneseTokenizer.Mode Tokenization mode: this determines how the tokenizer handles compound and unknown words.JapaneseTokenizer.Type Token type reflecting the original source of this token