org.apache.lucene.analysis.ko (Lucene 9.1.0 nori API)

package org.apache.lucene.analysis.ko

Analyzer for Korean.

Related Packages

Package

Description

org.apache.lucene.analysis.ko.dict

Korean dictionary implementation.

org.apache.lucene.analysis.ko.tokenattributes

Additional Korean-specific Attributes for text analysis.

org.apache.lucene.analysis.ko.util

Nori utility classes.
Class

Description

DecompoundToken

A token that was generated from a compound.

DictionaryToken

A token stored in a Dictionary.

GraphvizFormatter

Outputs the dot (graphviz) string for the viterbi lattice.

KoreanAnalyzer

Analyzer for Korean that uses morphological analysis.

KoreanNumberFilter

A TokenFilter that normalizes Korean numbers to regular Arabic decimal numbers in half-width characters.

KoreanNumberFilter.NumberBuffer

Buffer that holds a Korean number string and a position index used as a parsed-to marker

KoreanNumberFilterFactory

Factory for KoreanNumberFilter.

KoreanPartOfSpeechStopFilter

Removes tokens that match a set of part-of-speech tags.

KoreanPartOfSpeechStopFilterFactory

Factory for KoreanPartOfSpeechStopFilter.

KoreanReadingFormFilter

Replaces term text with the ReadingAttribute which is the Hangul transcription of Hanja characters.

KoreanReadingFormFilterFactory

Factory for KoreanReadingFormFilter.

KoreanTokenizer

Tokenizer for Korean that uses morphological analysis.

KoreanTokenizer.DecompoundMode

Decompound mode: this determines how the tokenizer handles POS.Type.COMPOUND, POS.Type.INFLECT and POS.Type.PREANALYSIS tokens.

KoreanTokenizer.Type

Token type reflecting the original source of this token

KoreanTokenizerFactory

Factory for KoreanTokenizer.

POS

Part of speech classification for Korean based on Sejong corpus classification.

POS.Tag

Part of speech tag for Korean based on Sejong corpus classification.

POS.Type

The type of the token.

Token

Analyzed token with morphological data.

Package org.apache.lucene.analysis.ko