org.apache.lucene.analysis.miscellaneous (Lucene 9.12.0 common API)

Miscellaneous Tokenstreams.

Interface Summary
Interface Description

ConcatenateGraphFilter.BytesRefBuilderTermAttribute
Attribute providing access to the term builder and UTF-16 conversion

Interface Summary
Interface	Description
ConcatenateGraphFilter.BytesRefBuilderTermAttribute	Attribute providing access to the term builder and UTF-16 conversion

Class Summary
Class	Description
ASCIIFoldingFilter	This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.
ASCIIFoldingFilterFactory	Factory for `ASCIIFoldingFilter`.
CapitalizationFilter	A filter to apply normal capitalization rules to Tokens.
CapitalizationFilterFactory	Factory for `CapitalizationFilter`.
CodepointCountFilter	Removes words that are too long or too short from the stream.
CodepointCountFilterFactory	Factory for `CodepointCountFilter`.
ConcatenateGraphFilter	Concatenates/Joins every incoming token with a separator into one output token for every path through the token stream (which is a graph).
ConcatenateGraphFilter.BytesRefBuilderTermAttributeImpl	Implementation of `ConcatenateGraphFilter.BytesRefBuilderTermAttribute`
ConcatenateGraphFilterFactory	Factory for `ConcatenateGraphFilter`.
ConcatenatingTokenStream	A TokenStream that takes an array of input TokenStreams as sources, and concatenates them together.
ConditionalTokenFilter	Allows skipping TokenFilters based on the current set of attributes.
ConditionalTokenFilterFactory	Abstract parent class for analysis factories that create `ConditionalTokenFilter` instances
DateRecognizerFilter	Filters all tokens that cannot be parsed to a date, using the provided `DateFormat`.
DateRecognizerFilterFactory	Factory for `DateRecognizerFilter`.
DelimitedTermFrequencyTokenFilter	Characters before the delimiter are the "token", the textual integer after is the term frequency.
DelimitedTermFrequencyTokenFilterFactory	Factory for `DelimitedTermFrequencyTokenFilter`.
DropIfFlaggedFilter	Allows Tokens with a given combination of flags to be dropped.
DropIfFlaggedFilterFactory	Provides a filter that will drop tokens matching a set of flags.
EmptyTokenStream	An always exhausted token stream.
FingerprintFilter	Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens.
FingerprintFilterFactory	Factory for `FingerprintFilter`.
FixBrokenOffsetsFilter	Deprecated. Fix the token filters that create broken offsets in the first place.
FixBrokenOffsetsFilterFactory	Deprecated.
HyphenatedWordsFilter	When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.
HyphenatedWordsFilterFactory	Factory for `HyphenatedWordsFilter`.
KeepWordFilter	A TokenFilter that only keeps tokens with text contained in the required words.
KeepWordFilterFactory	Factory for `KeepWordFilter`.
KeywordMarkerFilter	Marks terms as keywords via the `KeywordAttribute`.
KeywordMarkerFilterFactory	Factory for `KeywordMarkerFilter`.
KeywordRepeatFilter	This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once with `KeywordAttribute.setKeyword(boolean)` set to `true` and once set to `false`.
KeywordRepeatFilterFactory	Factory for `KeywordRepeatFilter`.
LengthFilter	Removes words that are too long or too short from the stream.
LengthFilterFactory	Factory for `LengthFilter`.
LimitTokenCountAnalyzer	This Analyzer limits the number of tokens while indexing.
LimitTokenCountFilter	This TokenFilter limits the number of tokens while indexing.
LimitTokenCountFilterFactory	Factory for `LimitTokenCountFilter`.
LimitTokenOffsetFilter	Lets all tokens pass through until it sees one with a start offset <= a configured limit, which won't pass and ends the stream.
LimitTokenOffsetFilterFactory	Factory for `LimitTokenOffsetFilter`.
LimitTokenPositionFilter	This TokenFilter limits its emitted tokens to those with positions that are not greater than the configured limit.
LimitTokenPositionFilterFactory	Factory for `LimitTokenPositionFilter`.
PatternKeywordMarkerFilter	Marks terms as keywords via the `KeywordAttribute`.
PerFieldAnalyzerWrapper	This analyzer is used to facilitate scenarios where different fields require different analysis techniques.
ProtectedTermFilter	A ConditionalTokenFilter that only applies its wrapped filters to tokens that are not contained in a protected set.
ProtectedTermFilterFactory	Factory for a `ProtectedTermFilter`
RemoveDuplicatesTokenFilter	A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
RemoveDuplicatesTokenFilterFactory	Factory for `RemoveDuplicatesTokenFilter`.
ScandinavianFoldingFilter	This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
ScandinavianFoldingFilterFactory	Factory for `ScandinavianFoldingFilter`.
ScandinavianNormalizationFilter	This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
ScandinavianNormalizationFilterFactory	Factory for `ScandinavianNormalizationFilter`.
ScandinavianNormalizer	This Normalizer does the heavy lifting for a set of Scandinavian normalization filters, normalizing use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
SetKeywordMarkerFilter	Marks terms as keywords via the `KeywordAttribute`.
StemmerOverrideFilter	Provides the ability to override any `KeywordAttribute` aware stemmer with custom dictionary-based stemming.
StemmerOverrideFilter.Builder	This builder builds an `FST` for the `StemmerOverrideFilter`
StemmerOverrideFilter.StemmerOverrideMap	A read-only 4-byte FST backed map that allows fast case-insensitive key value lookups for `StemmerOverrideFilter`
StemmerOverrideFilterFactory	Factory for `StemmerOverrideFilter`.
TrimFilter	Trims leading and trailing whitespace from Tokens in the stream.
TrimFilterFactory	Factory for `TrimFilter`.
TruncateTokenFilter	A token filter for truncating the terms into a specific length.
TruncateTokenFilterFactory	Factory for `TruncateTokenFilter`.
TypeAsSynonymFilter	Adds the `TypeAttribute.type()` as a synonym, i.e.
TypeAsSynonymFilterFactory	Factory for `TypeAsSynonymFilter`.
WordDelimiterFilter	Deprecated. Use `WordDelimiterGraphFilter` instead: it produces a correct token graph so that e.g.
WordDelimiterFilterFactory	Deprecated. Use `WordDelimiterGraphFilterFactory` instead: it produces a correct token graph so that e.g.
WordDelimiterGraphFilter	Splits words into subwords and performs optional transformations on subword groups, producing a correct token graph so that e.g.
WordDelimiterGraphFilterFactory	Factory for `WordDelimiterGraphFilter`.
WordDelimiterIterator	A BreakIterator-like API for iterating over subwords in text, according to WordDelimiterGraphFilter rules.

Enum Summary
Enum Description

ScandinavianNormalizer.Foldings
List of possible foldings that can be used when configuring the filter

Package org.apache.lucene.analysis.miscellaneous