Utility functions for text analysis.
Class Summary Class Description CharArrayIteratorA CharacterIterator used internally for use with
CharTokenizerAn abstract base class for simple, character-oriented tokenizers. ElisionFilterRemoves elisions from a
ResourceLoaderthat opens resource files from the local file system, optionally resolving against a base directory.
OpenStringBuilderA StringBuilder that allows one to access the array. RollingCharBufferActs like a forever growing char as you read characters into it from the provided reader, but internally it uses a circular buffer to only hold the characters that haven't been freed yet. SegmentingTokenizerBaseBreaks text into sentences with a
BreakIteratorand allows subclasses to decompose these sentences into words.
StemmerUtilSome commonly-used stemming functions UnicodePropsThis file contains unicode properties used by various