Class LowerCaseTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    @Deprecated
    public final class LowerCaseTokenizer
    extends LetterTokenizer
    Deprecated.
    LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

    Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

    • Constructor Detail

      • LowerCaseTokenizer

        public LowerCaseTokenizer()
        Deprecated.
        Construct a new LowerCaseTokenizer.
      • LowerCaseTokenizer

        public LowerCaseTokenizer​(AttributeFactory factory)
        Deprecated.
        Construct a new LowerCaseTokenizer using a given AttributeFactory.
        Parameters:
        factory - the attribute factory to use for this Tokenizer
      • LowerCaseTokenizer

        public LowerCaseTokenizer​(AttributeFactory factory,
                                  int maxTokenLen)
        Deprecated.
        Construct a new LowerCaseTokenizer using a given AttributeFactory.
        Parameters:
        factory - the attribute factory to use for this Tokenizer
        maxTokenLen - maximum token length the tokenizer will emit. Must be greater than 0 and less than MAX_TOKEN_LENGTH_LIMIT (1024*1024)
        Throws:
        IllegalArgumentException - if maxTokenLen is invalid.