ASCIIFoldingFilter |
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
CapitalizationFilter |
A filter to apply normal capitalization rules to Tokens.
|
CodepointCountFilter |
Removes words that are too long or too short from the stream.
|
ConcatenateGraphFilter.BytesRefBuilderTermAttribute |
Attribute providing access to the term builder and UTF-16 conversion
|
ConditionalTokenFilter |
Allows skipping TokenFilters based on the current set of attributes.
|
ConditionalTokenFilterFactory |
|
DelimitedTermFrequencyTokenFilter |
Characters before the delimiter are the "token", the textual integer after is the term frequency.
|
HyphenatedWordsFilter |
When the plain text is extracted from documents, we will often have many words hyphenated and broken into
two lines.
|
KeywordMarkerFilter |
|
LengthFilter |
Removes words that are too long or too short from the stream.
|
RemoveDuplicatesTokenFilter |
A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
|
ScandinavianFoldingFilter |
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
|
ScandinavianNormalizationFilter |
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ
and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
|
StemmerOverrideFilter.StemmerOverrideMap |
A read-only 4-byte FST backed map that allows fast case-insensitive key
value lookups for StemmerOverrideFilter
|
TrimFilter |
Trims leading and trailing whitespace from Tokens in the stream.
|