All Classes and Interfaces (Lucene 9.11.0 icu API)

Class

Description

Default ICUTokenizerConfig that is generally applicable to many languages.

Extension of CharTermAttributeImpl that encodes the term text as a binary Unicode collation key instead of as UTF-8 bytes.

Converts each token into its CollationKey, and then encodes bytes as an index term.

Indexes collation keys as a single-valued SortedDocValuesField.

A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.

Normalize token text with ICU's Normalizer2.

Normalize token text with ICU's Normalizer2

Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)

Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.

Factory for ICUTokenizer.

A TokenFilter that transforms text with ICU.

This attribute stores the UTR #24 script value for a token of text.

Implementation of ScriptAttribute that stores the script as an integer.