Package org.apache.lucene.analysis.icu.segmentation

Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.

See:
          Description

Class Summary
DefaultICUTokenizerConfig Default ICUTokenizerConfig that is generally applicable to many languages.
ICUTokenizer Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
ICUTokenizerConfig Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
ICUTokenizerFactory Factory for ICUTokenizer.
LaoBreakIterator Syllable iterator for Lao text.
 

Package org.apache.lucene.analysis.icu.segmentation Description

Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.



Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.