Package org.apache.lucene.analysis.util
Utility functions for text analysis.
-
Interface Summary Interface Description MultiTermAwareComponent Add to any analysis factory component to allow returning an analysis component factory for use with partial terms in prefix queries, wildcard queries, range query endpoints, regex queries, etc.ResourceLoader Abstraction for loading resources (streams, files, and classes).ResourceLoaderAware Interface for a component that needs to be initialized by an implementation ofResourceLoader
. -
Class Summary Class Description AbstractAnalysisFactory Abstract parent class for analysis factoriesTokenizerFactory
,TokenFilterFactory
andCharFilterFactory
.AnalysisSPILoader<S extends AbstractAnalysisFactory> Helper class for loading named SPIs from classpath (e.g.CharArrayIterator A CharacterIterator used internally for use withBreakIterator
CharFilterFactory Abstract parent class for analysis factories that createCharFilter
instances.CharTokenizer An abstract base class for simple, character-oriented tokenizers.ClasspathResourceLoader SimpleResourceLoader
that usesClassLoader.getResourceAsStream(String)
andClass.forName(String,boolean,ClassLoader)
to open resources and classes, respectively.ElisionFilter Removes elisions from aTokenStream
.ElisionFilterFactory Factory forElisionFilter
.FilesystemResourceLoader SimpleResourceLoader
that opens resource files from the local file system, optionally resolving against a base directory.OpenStringBuilder A StringBuilder that allows one to access the array.RollingCharBuffer Acts like a forever growing char[] as you read characters into it from the provided reader, but internally it uses a circular buffer to only hold the characters that haven't been freed yet.SegmentingTokenizerBase Breaks text into sentences with aBreakIterator
and allows subclasses to decompose these sentences into words.StemmerUtil Some commonly-used stemming functionsTokenFilterFactory Abstract parent class for analysis factories that createTokenFilter
instances.TokenizerFactory Abstract parent class for analysis factories that createTokenizer
instances.UnicodeProps This file contains unicode properties used by variousCharTokenizer
s.