Package org.apache.lucene.analysis.icu.segmentation
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
-
Class Summary Class Description DefaultICUTokenizerConfig DefaultICUTokenizerConfig
that is generally applicable to many languages.ICUTokenizer Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)ICUTokenizerConfig Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.ICUTokenizerFactory Factory forICUTokenizer
.