Class ThaiTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public class ThaiTokenizer
    extends SegmentingTokenizerBase
    Tokenizer that use BreakIterator to tokenize Thai text.

    WARNING: this tokenizer may not be supported by all JREs. It is known to work with Sun/Oracle and Harmony JREs. If your application needs to be fully portable, consider using ICUTokenizer instead, which uses an ICU Thai BreakIterator that will always be available.

    • Field Detail


        public static final boolean DBBI_AVAILABLE
        True if the JRE supports a working dictionary-based breakiterator for Thai. If this is false, this tokenizer will not work at all!
    • Constructor Detail

      • ThaiTokenizer

        public ThaiTokenizer()
        Creates a new ThaiTokenizer
      • ThaiTokenizer

        public ThaiTokenizer​(AttributeFactory factory)
        Creates a new ThaiTokenizer, supplying the AttributeFactory