Class SimplePatternSplitTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class SimplePatternSplitTokenizer
    extends Tokenizer
    This tokenizer uses a Lucene RegExp or (expert usage) a pre-built determinized Automaton, to locate tokens. The regexp syntax is more limited than PatternTokenizer, but the tokenization is quite a bit faster. This is just like SimplePatternTokenizer except that the pattern shold make valid token separator characters, like String.split. Empty string tokens are never produced.
    WARNING: This API is experimental and might change in incompatible ways in the next release.