|Class and Description|
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.
A filter to apply normal capitalization rules to Tokens.
Removes words that are too long or too short from the stream.
When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.
Marks terms as keywords via the
A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
A read-only 4-byte FST backed map that allows fast case-insensitive key value lookups for
Copyright © 2000-2015 Apache Software Foundation. All Rights Reserved.