All Classes and Interfaces (Lucene 10.1.0 nori API)

Class

Description

CharacterDefinition

Character category data.

ConnectionCosts

n-gram connection cost data

DecompoundToken

A token that was generated from a compound.

DictionaryBuilder

Tool to build dictionaries.

DictionaryToken

A token stored in a KoMorphData.

KoMorphData

Represents Korean morphological information.

KoMorphData.Morpheme

A morpheme extracted from a compound token.

KoreanAnalyzer

Analyzer for Korean that uses morphological analysis.

KoreanNumberFilter

A TokenFilter that normalizes Korean numbers to regular Arabic decimal numbers in half-width characters.

KoreanNumberFilter.NumberBuffer

Buffer that holds a Korean number string and a position index used as a parsed-to marker

KoreanNumberFilterFactory

Factory for KoreanNumberFilter.

KoreanPartOfSpeechStopFilter

Removes tokens that match a set of part-of-speech tags.

KoreanPartOfSpeechStopFilterFactory

Factory for KoreanPartOfSpeechStopFilter.

KoreanReadingFormFilter

Replaces term text with the ReadingAttribute which is the Hangul transcription of Hanja characters.

KoreanReadingFormFilterFactory

Factory for KoreanReadingFormFilter.

KoreanTokenizer

Tokenizer for Korean that uses morphological analysis.

KoreanTokenizer.DecompoundMode

Decompound mode: this determines how the tokenizer handles POS.Type.COMPOUND, POS.Type.INFLECT and POS.Type.PREANALYSIS tokens.

KoreanTokenizerFactory

Factory for KoreanTokenizer.

PartOfSpeechAttribute

Part of Speech attributes for Korean.

PartOfSpeechAttributeImpl

Part of Speech attributes for Korean.

POS

Part of speech classification for Korean based on Sejong corpus classification.

POS.Tag

Part of speech tag for Korean based on Sejong corpus classification.

POS.Type

The type of the token.

ReadingAttribute

Attribute for Korean reading data

ReadingAttributeImpl

Attribute for Korean reading data

Token

Analyzed token with morphological data.

TokenInfoDictionary

Binary dictionary implementation for a known-word dictionary model: Words are encoded into an FST mapping to a list of wordIDs.

TokenInfoFST

Thin wrapper around an FST with root-arc caching for Hangul syllables (11,172 arcs).

UnknownDictionary

Dictionary for unknown-word handling.

UserDictionary

Class for building a User Dictionary.