org.apache.lucene.analysis.ja (Lucene 9.1.0 kuromoji API)

package org.apache.lucene.analysis.ja

Analyzer for Japanese.

Related Packages

Package

Description

org.apache.lucene.analysis.ja.completion

Utilities for JapaneseCompletionFilter

org.apache.lucene.analysis.ja.dict

Kuromoji dictionary implementation.

org.apache.lucene.analysis.ja.tokenattributes

Additional Kuromoji-specific Attributes for text analysis.

org.apache.lucene.analysis.ja.util

Kuromoji utility classes.
Class

Description

GraphvizFormatter

Outputs the dot (graphviz) string for the viterbi lattice.

JapaneseAnalyzer

Analyzer for Japanese that uses morphological analysis.

JapaneseBaseFormFilter

Replaces term text with the BaseFormAttribute.

JapaneseBaseFormFilterFactory

Factory for JapaneseBaseFormFilter.

JapaneseCompletionAnalyzer

Analyzer for Japanese completion suggester.

JapaneseCompletionFilter

A TokenFilter that adds Japanese romanized tokens to the term attribute.

JapaneseCompletionFilter.Mode

Completion mode

JapaneseCompletionFilterFactory

Factory for JapaneseCompletionFilter.

JapaneseIterationMarkCharFilter

Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.

JapaneseIterationMarkCharFilterFactory

Factory for JapaneseIterationMarkCharFilter.

JapaneseKatakanaStemFilter

A TokenFilter that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).

JapaneseKatakanaStemFilterFactory

Factory for JapaneseKatakanaStemFilter.

JapaneseNumberFilter

A TokenFilter that normalizes Japanese numbers (kansūji) to regular Arabic decimal numbers in half-width characters.

JapaneseNumberFilter.NumberBuffer

Buffer that holds a Japanese number string and a position index used as a parsed-to marker

JapaneseNumberFilterFactory

Factory for JapaneseNumberFilter.

JapanesePartOfSpeechStopFilter

Removes tokens that match a set of part-of-speech tags.

JapanesePartOfSpeechStopFilterFactory

Factory for JapanesePartOfSpeechStopFilter.

JapaneseReadingFormFilter

A TokenFilter that replaces the term attribute with the reading of a token in either katakana or romaji form.

JapaneseReadingFormFilterFactory

Factory for JapaneseReadingFormFilter.

JapaneseTokenizer

Tokenizer for Japanese that uses morphological analysis.

JapaneseTokenizer.Mode

Tokenization mode: this determines how the tokenizer handles compound and unknown words.

JapaneseTokenizer.Type

Token type reflecting the original source of this token

JapaneseTokenizerFactory

Factory for JapaneseTokenizer.

Token

Analyzed token with morphological data from its dictionary.

Package org.apache.lucene.analysis.ja