Package org.apache.lucene.analysis.miscellaneous

Miscellaneous TokenStreams


Class Summary
ASCIIFoldingFilter This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.
ASCIIFoldingFilterFactory Factory for ASCIIFoldingFilter.
CapitalizationFilter A filter to apply normal capitalization rules to Tokens.
CapitalizationFilterFactory Factory for CapitalizationFilter.
EmptyTokenStream An always exhausted token stream.
HyphenatedWordsFilter When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.
HyphenatedWordsFilterFactory Factory for HyphenatedWordsFilter.
KeepWordFilter A TokenFilter that only keeps tokens with text contained in the required words.
KeepWordFilterFactory Factory for KeepWordFilter.
KeywordMarkerFilter Marks terms as keywords via the KeywordAttribute.
KeywordMarkerFilterFactory Factory for KeywordMarkerFilter.
LengthFilter Removes words that are too long or too short from the stream.
LengthFilterFactory Factory for LengthFilter.
LimitTokenCountAnalyzer This Analyzer limits the number of tokens while indexing.
LimitTokenCountFilter This TokenFilter limits the number of tokens while indexing.
LimitTokenCountFilterFactory Factory for LimitTokenCountFilter.
PatternAnalyzer Deprecated. (4.0) use the pattern-based analysis in the analysis/pattern package instead.
PerFieldAnalyzerWrapper This analyzer is used to facilitate scenarios where different fields require different analysis techniques.
PrefixAndSuffixAwareTokenFilter Links two PrefixAwareTokenFilter.
PrefixAwareTokenFilter Joins two token streams and leaves the last token of the first stream available to be used when updating the token values in the second stream based on that token.
RemoveDuplicatesTokenFilter A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
RemoveDuplicatesTokenFilterFactory Factory for RemoveDuplicatesTokenFilter.
SingleTokenTokenStream A TokenStream containing a single token.
StemmerOverrideFilter Provides the ability to override any KeywordAttribute aware stemmer with custom dictionary-based stemming.
StemmerOverrideFilterFactory Factory for StemmerOverrideFilter.
TrimFilter Trims leading and trailing whitespace from Tokens in the stream.
TrimFilterFactory Factory for TrimFilter.
WordDelimiterFilter Splits words into subwords and performs optional transformations on subword groups.
WordDelimiterFilterFactory Factory for WordDelimiterFilter.
WordDelimiterIterator A BreakIterator-like API for iterating over subwords in text, according to WordDelimiterFilter rules.

Package org.apache.lucene.analysis.miscellaneous Description

Miscellaneous TokenStreams

Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.