org.apache.lucene.analysis.icu.segmentation
Class ICUTokenizerConfig

java.lang.Object
  extended by org.apache.lucene.analysis.icu.segmentation.ICUTokenizerConfig
Direct Known Subclasses:
DefaultICUTokenizerConfig

public abstract class ICUTokenizerConfig
extends Object

Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Constructor Summary
ICUTokenizerConfig()
          Sole constructor.
 
Method Summary
abstract  boolean combineCJ()
          true if Han, Hiragana, and Katakana scripts should all be returned as Japanese
abstract  com.ibm.icu.text.BreakIterator getBreakIterator(int script)
          Return a breakiterator capable of processing a given script.
abstract  String getType(int script, int ruleStatus)
          Return a token type value for a given script and BreakIterator rule status.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ICUTokenizerConfig

public ICUTokenizerConfig()
Sole constructor. (For invocation by subclass constructors, typically implicit.)

Method Detail

getBreakIterator

public abstract com.ibm.icu.text.BreakIterator getBreakIterator(int script)
Return a breakiterator capable of processing a given script.


getType

public abstract String getType(int script,
                               int ruleStatus)
Return a token type value for a given script and BreakIterator rule status.


combineCJ

public abstract boolean combineCJ()
true if Han, Hiragana, and Katakana scripts should all be returned as Japanese



Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.