All Classes and Interfaces
Class
Description
Default
ICUTokenizerConfig
that is generally applicable to many languages.Extension of
CharTermAttributeImpl
that encodes the term text as a binary Unicode
collation key instead of as UTF-8 bytes.Converts each token into its
CollationKey
, and then encodes bytes as an
index term.Indexes collation keys as a single-valued
SortedDocValuesField
.Configures
KeywordTokenizer
with ICUCollationAttributeFactory
.A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30
Character Foldings.
Factory for
ICUFoldingFilter
.Normalize token text with ICU's
Normalizer2
.Factory for
ICUNormalizer2CharFilter
Normalize token text with ICU's
Normalizer2
Factory for
ICUNormalizer2Filter
Breaks text into words according to UAX #29: Unicode Text Segmentation
(http://www.unicode.org/reports/tr29/)
Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
Factory for
ICUTokenizer
.A
TokenFilter
that transforms text with ICU.Factory for
ICUTransformFilter
.This attribute stores the UTR #24 script value for a token of text.
Implementation of
ScriptAttribute
that stores the script as an integer.