Package org.apache.lucene.analysis.miscellaneous

Miscellaneous TokenStreams


Class Summary
ASCIIFoldingFilter This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.
ASCIIFoldingFilterFactory Factory for ASCIIFoldingFilter.
CapitalizationFilter A filter to apply normal capitalization rules to Tokens.
CapitalizationFilterFactory Factory for CapitalizationFilter.
CodepointCountFilter Removes words that are too long or too short from the stream.
CodepointCountFilterFactory Factory for CodepointCountFilter.
EmptyTokenStream An always exhausted token stream.
HyphenatedWordsFilter When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.
HyphenatedWordsFilterFactory Factory for HyphenatedWordsFilter.
KeepWordFilter A TokenFilter that only keeps tokens with text contained in the required words.
KeepWordFilterFactory Factory for KeepWordFilter.
KeywordMarkerFilter Marks terms as keywords via the KeywordAttribute.
KeywordMarkerFilterFactory Factory for KeywordMarkerFilter.
KeywordRepeatFilter This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once with KeywordAttribute.setKeyword(boolean) set to true and once set to false.
KeywordRepeatFilterFactory Factory for KeywordRepeatFilter.
LengthFilter Removes words that are too long or too short from the stream.
LengthFilterFactory Factory for LengthFilter.
LimitTokenCountAnalyzer This Analyzer limits the number of tokens while indexing.
LimitTokenCountFilter This TokenFilter limits the number of tokens while indexing.
LimitTokenCountFilterFactory Factory for LimitTokenCountFilter.
LimitTokenPositionFilter This TokenFilter limits its emitted tokens to those with positions that are not greater than the configured limit.
LimitTokenPositionFilterFactory Factory for LimitTokenPositionFilter.
PatternAnalyzer Deprecated. (4.0) use the pattern-based analysis in the analysis/pattern package instead.
PatternKeywordMarkerFilter Marks terms as keywords via the KeywordAttribute.
PerFieldAnalyzerWrapper This analyzer is used to facilitate scenarios where different fields require different analysis techniques.
PrefixAndSuffixAwareTokenFilter Links two PrefixAwareTokenFilter.
PrefixAwareTokenFilter Joins two token streams and leaves the last token of the first stream available to be used when updating the token values in the second stream based on that token.
RemoveDuplicatesTokenFilter A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
RemoveDuplicatesTokenFilterFactory Factory for RemoveDuplicatesTokenFilter.
ScandinavianFoldingFilter This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
ScandinavianFoldingFilterFactory Factory for ScandinavianFoldingFilter.
ScandinavianNormalizationFilter This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
ScandinavianNormalizationFilterFactory Factory for ScandinavianNormalizationFilter.
SetKeywordMarkerFilter Marks terms as keywords via the KeywordAttribute.
SingleTokenTokenStream A TokenStream containing a single token.
StemmerOverrideFilter Provides the ability to override any KeywordAttribute aware stemmer with custom dictionary-based stemming.
StemmerOverrideFilter.Builder This builder builds an FST for the StemmerOverrideFilter
StemmerOverrideFilter.StemmerOverrideMap A read-only 4-byte FST backed map that allows fast case-insensitive key value lookups for StemmerOverrideFilter
StemmerOverrideFilterFactory Factory for StemmerOverrideFilter.
TrimFilter Trims leading and trailing whitespace from Tokens in the stream.
TrimFilterFactory Factory for TrimFilter.
WordDelimiterFilter Splits words into subwords and performs optional transformations on subword groups.
WordDelimiterFilterFactory Factory for WordDelimiterFilter.
WordDelimiterIterator A BreakIterator-like API for iterating over subwords in text, according to WordDelimiterFilter rules.

Package org.apache.lucene.analysis.miscellaneous Description

Miscellaneous TokenStreams

Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.