Package org.apache.lucene.analysis.miscellaneous
Miscellaneous Tokenstreams.
-
Interface Summary Interface Description ConcatenateGraphFilter.BytesRefBuilderTermAttribute Attribute providing access to the term builder and UTF-16 conversion -
Class Summary Class Description ASCIIFoldingFilter This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.ASCIIFoldingFilterFactory Factory forASCIIFoldingFilter
.CapitalizationFilter A filter to apply normal capitalization rules to Tokens.CapitalizationFilterFactory Factory forCapitalizationFilter
.CodepointCountFilter Removes words that are too long or too short from the stream.CodepointCountFilterFactory Factory forCodepointCountFilter
.ConcatenateGraphFilter Concatenates/Joins every incoming token with a separator into one output token for every path through the token stream (which is a graph).ConcatenateGraphFilter.BytesRefBuilderTermAttributeImpl Implementation ofConcatenateGraphFilter.BytesRefBuilderTermAttribute
ConcatenateGraphFilterFactory Factory forConcatenateGraphFilter
.ConcatenatingTokenStream A TokenStream that takes an array of input TokenStreams as sources, and concatenates them together.ConditionalTokenFilter Allows skipping TokenFilters based on the current set of attributes.ConditionalTokenFilterFactory Abstract parent class for analysis factories that createConditionalTokenFilter
instancesDateRecognizerFilter Filters all tokens that cannot be parsed to a date, using the providedDateFormat
.DateRecognizerFilterFactory Factory forDateRecognizerFilter
.DelimitedTermFrequencyTokenFilter Characters before the delimiter are the "token", the textual integer after is the term frequency.DelimitedTermFrequencyTokenFilterFactory Factory forDelimitedTermFrequencyTokenFilter
.DropIfFlaggedFilter Allows Tokens with a given combination of flags to be dropped.DropIfFlaggedFilterFactory Provides a filter that will drop tokens matching a set of flags.EmptyTokenStream An always exhausted token stream.FingerprintFilter Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens.FingerprintFilterFactory Factory forFingerprintFilter
.FixBrokenOffsetsFilter Deprecated. Fix the token filters that create broken offsets in the first place.FixBrokenOffsetsFilterFactory Deprecated. HyphenatedWordsFilter When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.HyphenatedWordsFilterFactory Factory forHyphenatedWordsFilter
.KeepWordFilter A TokenFilter that only keeps tokens with text contained in the required words.KeepWordFilterFactory Factory forKeepWordFilter
.KeywordMarkerFilter Marks terms as keywords via theKeywordAttribute
.KeywordMarkerFilterFactory Factory forKeywordMarkerFilter
.KeywordRepeatFilter This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once withKeywordAttribute.setKeyword(boolean)
set totrue
and once set tofalse
.KeywordRepeatFilterFactory Factory forKeywordRepeatFilter
.LengthFilter Removes words that are too long or too short from the stream.LengthFilterFactory Factory forLengthFilter
.LimitTokenCountAnalyzer This Analyzer limits the number of tokens while indexing.LimitTokenCountFilter This TokenFilter limits the number of tokens while indexing.LimitTokenCountFilterFactory Factory forLimitTokenCountFilter
.LimitTokenOffsetFilter Lets all tokens pass through until it sees one with a start offset <= a configured limit, which won't pass and ends the stream.LimitTokenOffsetFilterFactory Factory forLimitTokenOffsetFilter
.LimitTokenPositionFilter This TokenFilter limits its emitted tokens to those with positions that are not greater than the configured limit.LimitTokenPositionFilterFactory Factory forLimitTokenPositionFilter
.PatternKeywordMarkerFilter Marks terms as keywords via theKeywordAttribute
.PerFieldAnalyzerWrapper This analyzer is used to facilitate scenarios where different fields require different analysis techniques.ProtectedTermFilter A ConditionalTokenFilter that only applies its wrapped filters to tokens that are not contained in a protected set.ProtectedTermFilterFactory Factory for aProtectedTermFilter
RemoveDuplicatesTokenFilter A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.RemoveDuplicatesTokenFilterFactory Factory forRemoveDuplicatesTokenFilter
.ScandinavianFoldingFilter This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.ScandinavianFoldingFilterFactory Factory forScandinavianFoldingFilter
.ScandinavianNormalizationFilter This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.ScandinavianNormalizationFilterFactory Factory forScandinavianNormalizationFilter
.ScandinavianNormalizer This Normalizer does the heavy lifting for a set of Scandinavian normalization filters, normalizing use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.SetKeywordMarkerFilter Marks terms as keywords via theKeywordAttribute
.StemmerOverrideFilter Provides the ability to override anyKeywordAttribute
aware stemmer with custom dictionary-based stemming.StemmerOverrideFilter.Builder This builder builds anFST
for theStemmerOverrideFilter
StemmerOverrideFilter.StemmerOverrideMap A read-only 4-byte FST backed map that allows fast case-insensitive key value lookups forStemmerOverrideFilter
StemmerOverrideFilterFactory Factory forStemmerOverrideFilter
.TrimFilter Trims leading and trailing whitespace from Tokens in the stream.TrimFilterFactory Factory forTrimFilter
.TruncateTokenFilter A token filter for truncating the terms into a specific length.TruncateTokenFilterFactory Factory forTruncateTokenFilter
.TypeAsSynonymFilter Adds theTypeAttribute.type()
as a synonym, i.e.TypeAsSynonymFilterFactory Factory forTypeAsSynonymFilter
.WordDelimiterFilter Deprecated. UseWordDelimiterGraphFilter
instead: it produces a correct token graph so that e.g.WordDelimiterFilterFactory Deprecated. UseWordDelimiterGraphFilterFactory
instead: it produces a correct token graph so that e.g.WordDelimiterGraphFilter Splits words into subwords and performs optional transformations on subword groups, producing a correct token graph so that e.g.WordDelimiterGraphFilterFactory Factory forWordDelimiterGraphFilter
.WordDelimiterIterator A BreakIterator-like API for iterating over subwords in text, according to WordDelimiterGraphFilter rules. -
Enum Summary Enum Description ScandinavianNormalizer.Foldings List of possible foldings that can be used when configuring the filter