Class WhitespaceTokenizer

All Implemented Interfaces:
Closeable, AutoCloseable

public final class WhitespaceTokenizer extends CharTokenizer
A tokenizer that divides text at whitespace characters as defined by Character.isWhitespace(int). Note: That definition explicitly excludes the non-breaking space. Adjacent sequences of non-Whitespace characters form tokens.
See Also:
  • Constructor Details

    • WhitespaceTokenizer

      public WhitespaceTokenizer()
      Construct a new WhitespaceTokenizer.
    • WhitespaceTokenizer

      public WhitespaceTokenizer(AttributeFactory factory)
      Construct a new WhitespaceTokenizer using a given AttributeFactory.
      Parameters:
      factory - the attribute factory to use for this Tokenizer
    • WhitespaceTokenizer

      public WhitespaceTokenizer(int maxTokenLen)
      Construct a new WhitespaceTokenizer using a given max token length
      Parameters:
      maxTokenLen - maximum token length the tokenizer will emit. Must be greater than 0 and less than MAX_TOKEN_LENGTH_LIMIT (1024*1024)
      Throws:
      IllegalArgumentException - if maxTokenLen is invalid.
    • WhitespaceTokenizer

      public WhitespaceTokenizer(AttributeFactory factory, int maxTokenLen)
      Construct a new WhitespaceTokenizer using a given AttributeFactory.
      Parameters:
      factory - the attribute factory to use for this Tokenizer
      maxTokenLen - maximum token length the tokenizer will emit. Must be greater than 0 and less than MAX_TOKEN_LENGTH_LIMIT (1024*1024)
      Throws:
      IllegalArgumentException - if maxTokenLen is invalid.
  • Method Details