Package org.apache.lucene.analysis.icu.segmentation
package org.apache.lucene.analysis.icu.segmentation
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
-
ClassDescriptionDefault
ICUTokenizerConfig
that is generally applicable to many languages.Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.Factory forICUTokenizer
.