AbstractEncoder |
Base class for payload encoders.
AbstractWordsFileFilterFactory |
Abstract parent class for analysis factories that accept a stopwords file as input.
AffixedWord |
An object representing the analysis result of a simple (non-compound) word
AffixedWord.Affix |
An object representing a prefix or a suffix applied to a word stem
Among |
Internal class used by Snowball stemmers
ApostropheFilter |
Strips all characters after an apostrophe (including the apostrophe itself).
ApostropheFilterFactory |
ArabicAnalyzer |
ArabicNormalizationFilter |
ArabicNormalizationFilterFactory |
ArabicNormalizer |
Normalizer for Arabic.
ArabicStemFilter |
ArabicStemFilterFactory |
ArabicStemmer |
Stemmer for Arabic.
ArabicStemmer |
This class implements the stemming algorithm defined by a snowball script.
ArmenianAnalyzer |
ArmenianStemmer |
This class implements the stemming algorithm defined by a snowball script.
ASCIIFoldingFilter |
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the
first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one
ASCIIFoldingFilterFactory |
BaseCharFilter |
BasqueAnalyzer |
BasqueStemmer |
This class implements the stemming algorithm defined by a snowball script.
BengaliAnalyzer |
Analyzer for Bengali.
BengaliNormalizationFilter |
BengaliNormalizationFilterFactory |
BengaliNormalizer |
Normalizer for Bengali.
BengaliStemFilter |
BengaliStemFilterFactory |
BengaliStemmer |
Stemmer for Bengali.
BrazilianAnalyzer |
Analyzer for Brazilian Portuguese language.
BrazilianStemFilter |
BrazilianStemFilterFactory |
BrazilianStemmer |
A stemmer for Brazilian Portuguese words.
BulgarianAnalyzer |
BulgarianStemFilter |
BulgarianStemFilterFactory |
BulgarianStemmer |
Light Stemmer for Bulgarian.
ByteVector |
This class implements a simple byte vector with access to the underlying array.
CapitalizationFilter |
A filter to apply normal capitalization rules to Tokens.
CapitalizationFilterFactory |
CatalanAnalyzer |
CatalanStemmer |
This class implements the stemming algorithm defined by a snowball script.
CharArrayIterator |
CharTokenizer |
An abstract base class for simple, character-oriented tokenizers.
CharVector |
This class implements a simple char vector with access to the underlying array.
CJKAnalyzer |
CJKBigramFilter |
Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.
CJKBigramFilterFactory |
CJKWidthCharFilter |
A CharFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
CJKWidthCharFilterFactory |
CJKWidthFilter |
A TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
CJKWidthFilterFactory |
ClassicAnalyzer |
ClassicFilter |
ClassicFilterFactory |
ClassicTokenizer |
A grammar-based tokenizer constructed with JFlex
ClassicTokenizerFactory |
CodepointCountFilter |
Removes words that are too long or too short from the stream.
CodepointCountFilterFactory |
CollatedTermAttributeImpl |
Extension of CharTermAttributeImpl that encodes the term text as a binary Unicode
collation key instead of as UTF-8 bytes.
CollationAttributeFactory |
Converts each token into its CollationKey , and then encodes the bytes as an
index term.
CollationDocValuesField |
CollationKeyAnalyzer |
CommonGramsFilter |
Construct bigrams for frequently occurring terms while indexing.
CommonGramsFilterFactory |
CommonGramsQueryFilter |
Wrap a CommonGramsFilter optimizing phrase queries by only returning single words when they are
not a member of a bigram.
CommonGramsQueryFilterFactory |
CompoundWordTokenFilterBase |
Base class for decomposition token filters.
ConcatenateGraphFilter |
Concatenates/Joins every incoming token with a separator into one output token for every path
through the token stream (which is a graph).
ConcatenateGraphFilter.BytesRefBuilderTermAttribute |
Attribute providing access to the term builder and UTF-16 conversion
ConcatenateGraphFilter.BytesRefBuilderTermAttributeImpl |
ConcatenateGraphFilterFactory |
ConcatenatingTokenStream |
A TokenStream that takes an array of input TokenStreams as sources, and concatenates them
ConditionalTokenFilter |
Allows skipping TokenFilters based on the current set of attributes.
ConditionalTokenFilterFactory |
CSVUtil |
Utility class for parsing CSV text
CustomAnalyzer |
A general-purpose Analyzer that can be created with a builder-style API.
CustomAnalyzer.Builder |
CustomAnalyzer.ConditionBuilder |
CzechAnalyzer |
CzechStemFilter |
CzechStemFilterFactory |
CzechStemmer |
Light Stemmer for Czech.
DanishAnalyzer |
DanishStemmer |
This class implements the stemming algorithm defined by a snowball script.
DateRecognizerFilter |
Filters all tokens that cannot be parsed to a date, using the provided DateFormat .
DateRecognizerFilterFactory |
DecimalDigitFilter |
Folds all Unicode digits in [:General_Category=Decimal_Number:] to Basic Latin digits
(0-9 ).
DecimalDigitFilterFactory |
DelimitedBoostTokenFilter |
Characters before the delimiter are the "token", those after are the boost.
DelimitedBoostTokenFilterFactory |
DelimitedPayloadTokenFilter |
Characters before the delimiter are the "token", those after are the payload.
DelimitedPayloadTokenFilterFactory |
DelimitedTermFrequencyTokenFilter |
Characters before the delimiter are the "token", the textual integer after is the term frequency.
DelimitedTermFrequencyTokenFilterFactory |
DictEntries |
An object representing homonym dictionary entries.
DictEntry |
An object representing *.dic file entry with its word, flags and morphological data.
Dictionary |
In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.
DictionaryCompoundWordTokenFilter |
A TokenFilter that decomposes compound words found in many
Germanic languages.
DictionaryCompoundWordTokenFilterFactory |
Dl4jModelReader |
Dl4jModelReader reads the file generated by the library Deeplearning4j and provide a
Word2VecModel with normalized vectors
DropIfFlaggedFilter |
Allows Tokens with a given combination of flags to be dropped.
DropIfFlaggedFilterFactory |
Provides a filter that will drop tokens matching a set of flags.
DutchAnalyzer |
DutchStemmer |
This class implements the stemming algorithm defined by a snowball script.
EdgeNGramFilterFactory |
EdgeNGramTokenFilter |
Tokenizes the given token into n-grams of given size(s).
EdgeNGramTokenizer |
Tokenizes the input from an edge into n-grams of given size(s).
EdgeNGramTokenizerFactory |
ElisionFilter |
ElisionFilterFactory |
EmptyTokenStream |
An always exhausted token stream.
EnglishAnalyzer |
EnglishMinimalStemFilter |
EnglishMinimalStemFilterFactory |
EnglishMinimalStemmer |
Minimal plural stemmer for English.
EnglishPossessiveFilter |
TokenFilter that removes possessives (trailing 's) from words.
EnglishPossessiveFilterFactory |
EnglishStemmer |
This class implements the stemming algorithm defined by a snowball script.
EntrySuggestion |
EstonianAnalyzer |
EstonianStemmer |
This class implements the stemming algorithm defined by a snowball script.
FilesystemResourceLoader |
Simple ResourceLoader that opens resource files from the local file system, optionally
resolving against a base directory.
FingerprintFilter |
Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of
input tokens.
FingerprintFilterFactory |
FinnishAnalyzer |
FinnishLightStemFilter |
FinnishLightStemFilterFactory |
FinnishLightStemmer |
Light Stemmer for Finnish.
FinnishStemmer |
This class implements the stemming algorithm defined by a snowball script.
FixBrokenOffsetsFilter |
FixBrokenOffsetsFilterFactory |
Deprecated. |
FixedShingleFilter |
A FixedShingleFilter constructs shingles (token n-grams) from a token stream.
FixedShingleFilterFactory |
FlattenGraphFilter |
Converts an incoming graph token stream, such as one from SynonymGraphFilter , into a flat
form so that all nodes form a single linear chain with no side paths.
FlattenGraphFilterFactory |
FloatEncoder |
Encode a character array Float as a BytesRef .
FragmentChecker |
An oracle for quickly checking that a specific part of a word can never be a valid word.
FrenchAnalyzer |
FrenchLightStemFilter |
FrenchLightStemFilterFactory |
FrenchLightStemmer |
Light Stemmer for French.
FrenchMinimalStemFilter |
FrenchMinimalStemFilterFactory |
FrenchMinimalStemmer |
Light Stemmer for French.
FrenchStemmer |
This class implements the stemming algorithm defined by a snowball script.
GalicianAnalyzer |
GalicianMinimalStemFilter |
GalicianMinimalStemFilterFactory |
GalicianMinimalStemmer |
Minimal Stemmer for Galician
GalicianStemFilter |
GalicianStemFilterFactory |
GalicianStemmer |
Galician stemmer implementing "Regras do lematizador para o galego".
German2Stemmer |
This class implements the stemming algorithm defined by a snowball script.
GermanAnalyzer |
GermanLightStemFilter |
GermanLightStemFilterFactory |
GermanLightStemmer |
Light Stemmer for German.
GermanMinimalStemFilter |
GermanMinimalStemFilterFactory |
GermanMinimalStemmer |
Minimal Stemmer for German.
GermanNormalizationFilter |
GermanNormalizationFilterFactory |
GermanStemFilter |
GermanStemFilterFactory |
GermanStemmer |
A stemmer for German words.
GermanStemmer |
This class implements the stemming algorithm defined by a snowball script.
GreekAnalyzer |
GreekLowerCaseFilter |
Normalizes token text to lower case, removes some Greek diacritics, and standardizes final sigma
to sigma.
GreekLowerCaseFilterFactory |
GreekStemFilter |
GreekStemFilterFactory |
GreekStemmer |
A stemmer for Greek words, according to: Development of a Stemmer for the Greek Language.
Georgios Ntais
GreekStemmer |
This class implements the stemming algorithm defined by a snowball script.
HindiAnalyzer |
Analyzer for Hindi.
HindiNormalizationFilter |
HindiNormalizationFilterFactory |
HindiNormalizer |
Normalizer for Hindi.
HindiStemFilter |
HindiStemFilterFactory |
HindiStemmer |
Light Stemmer for Hindi.
HindiStemmer |
This class implements the stemming algorithm defined by a snowball script.
HTMLStripCharFilter |
A CharFilter that wraps another Reader and attempts to strip out HTML constructs.
HTMLStripCharFilterFactory |
HungarianAnalyzer |
HungarianLightStemFilter |
HungarianLightStemFilterFactory |
HungarianLightStemmer |
Light Stemmer for Hungarian.
HungarianStemmer |
This class implements the stemming algorithm defined by a snowball script.
Hunspell |
A spell checker based on Hunspell dictionaries.
HunspellStemFilter |
TokenFilter that uses hunspell affix rules and words to stem tokens.
HunspellStemFilterFactory |
Hyphen |
This class represents a hyphen.
HyphenatedWordsFilter |
When the plain text is extracted from documents, we will often have many words hyphenated and
broken into two lines.
HyphenatedWordsFilterFactory |
Hyphenation |
This class represents a hyphenated word.
HyphenationCompoundWordTokenFilter |
A TokenFilter that decomposes compound words found in many
Germanic languages.
HyphenationCompoundWordTokenFilterFactory |
HyphenationTree |
This tree structure stores the hyphenation patterns in an efficient way for fast lookup.
IdentityEncoder |
Does nothing other than convert the char array to a byte array using the specified encoding.
IndicNormalizationFilter |
IndicNormalizationFilterFactory |
IndicNormalizer |
Normalizes the Unicode representation of text in Indian languages.
IndonesianAnalyzer |
Analyzer for Indonesian (Bahasa)
IndonesianStemFilter |
IndonesianStemFilterFactory |
IndonesianStemmer |
Stemmer for Indonesian.
IndonesianStemmer |
This class implements the stemming algorithm defined by a snowball script.
IntegerEncoder |
Encode a character array Integer as a BytesRef .
IrishAnalyzer |
IrishLowerCaseFilter |
Normalises token text to lower case, handling t-prothesis and n-eclipsis (i.e., that 'nAthair'
should become 'n-athair')
IrishLowerCaseFilterFactory |
IrishStemmer |
This class implements the stemming algorithm defined by a snowball script.
ItalianAnalyzer |
ItalianLightStemFilter |
ItalianLightStemFilterFactory |
ItalianLightStemmer |
Light Stemmer for Italian.
ItalianStemmer |
This class implements the stemming algorithm defined by a snowball script.
KeepWordFilter |
A TokenFilter that only keeps tokens with text contained in the required words.
KeepWordFilterFactory |
KeywordAnalyzer |
"Tokenizes" the entire stream as a single token.
KeywordMarkerFilter |
KeywordMarkerFilterFactory |
KeywordRepeatFilter |
This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other
words once with KeywordAttribute.setKeyword(boolean) set to true and once
set to false .
KeywordRepeatFilterFactory |
KeywordTokenizer |
Emits the entire input as a single token.
KeywordTokenizerFactory |
KpStemmer |
This class implements the stemming algorithm defined by a snowball script.
KStemFilter |
A high-performance kstem filter for english.
KStemFilterFactory |
KStemmer |
This class implements the Kstem algorithm
LatvianAnalyzer |
LatvianStemFilter |
LatvianStemFilterFactory |
LatvianStemmer |
Light stemmer for Latvian.
LengthFilter |
Removes words that are too long or too short from the stream.
LengthFilterFactory |
LetterTokenizer |
A LetterTokenizer is a tokenizer that divides text at non-letters.
LetterTokenizerFactory |
LimitTokenCountAnalyzer |
This Analyzer limits the number of tokens while indexing.
LimitTokenCountFilter |
This TokenFilter limits the number of tokens while indexing.
LimitTokenCountFilterFactory |
LimitTokenOffsetFilter |
Lets all tokens pass through until it sees one with a start offset <= a configured limit,
which won't pass and ends the stream.
LimitTokenOffsetFilterFactory |
LimitTokenPositionFilter |
This TokenFilter limits its emitted tokens to those with positions that are not greater than the
configured limit.
LimitTokenPositionFilterFactory |
LithuanianAnalyzer |
LithuanianStemmer |
This class implements the stemming algorithm defined by a snowball script.
LovinsStemmer |
This class implements the stemming algorithm defined by a snowball script.
LowerCaseFilter |
Normalizes token text to lower case.
LowerCaseFilterFactory |
MappingCharFilter |
Simplistic CharFilter that applies the mappings contained in a NormalizeCharMap
to the character stream, and correcting the resulting changes to the offsets.
MappingCharFilterFactory |
MinHashFilter |
Generate min hash tokens from an incoming stream of tokens.
MinHashFilterFactory |
NepaliAnalyzer |
Analyzer for Nepali.
NepaliStemmer |
This class implements the stemming algorithm defined by a snowball script.
NGramFilterFactory |
NGramFragmentChecker |
A FragmentChecker based on all character n-grams possible in a certain language, keeping
them in a relatively memory-efficient, but probabilistic data structure.
NGramFragmentChecker.NGramConsumer |
A callback for n-gram ranges in words
NGramTokenFilter |
Tokenizes the input into n-grams of the given size(s).
NGramTokenizer |
Tokenizes the input into n-grams of the given size(s).
NGramTokenizerFactory |
NormalizeCharMap |
NormalizeCharMap.Builder |
Builds an NormalizeCharMap.
NorwegianAnalyzer |
NorwegianLightStemFilter |
NorwegianLightStemFilterFactory |
NorwegianLightStemmer |
Light Stemmer for Norwegian.
NorwegianMinimalStemFilter |
NorwegianMinimalStemFilterFactory |
NorwegianMinimalStemmer |
Minimal Stemmer for Norwegian Bokmål (no-nb) and Nynorsk (no-nn)
NorwegianNormalizationFilter |
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded
variants (ae, oe, aa) by transforming them to åÅæÆøØ.
NorwegianNormalizationFilterFactory |
NorwegianStemmer |
This class implements the stemming algorithm defined by a snowball script.
NumericPayloadTokenFilter |
NumericPayloadTokenFilterFactory |
OpenStringBuilder |
A StringBuilder that allows one to access the array.
PathHierarchyTokenizer |
Tokenizer for path-like hierarchies.
PathHierarchyTokenizerFactory |
PatternCaptureGroupFilterFactory |
PatternCaptureGroupTokenFilter |
CaptureGroup uses Java regexes to emit multiple tokens - one for each capture group in one or
more patterns.
PatternConsumer |
This interface is used to connect the XML pattern file parser to the hyphenation tree.
PatternKeywordMarkerFilter |
PatternParser |
A SAX document handler to read and parse hyphenation patterns from a XML file.
PatternReplaceCharFilter |
CharFilter that uses a regular expression for the target of replace string.
PatternReplaceCharFilterFactory |
PatternReplaceFilter |
A TokenFilter which applies a Pattern to each token in the stream, replacing match occurrences
with the specified replacement string.
PatternReplaceFilterFactory |
PatternTokenizer |
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.
PatternTokenizerFactory |
PatternTypingFilter |
Set a type attribute to a parameterized value when tokens are matched by any of a several regex
PatternTypingFilter.PatternTypingRule |
Value holding class for pattern typing rules.
PatternTypingFilterFactory |
Provides a filter that will analyze tokens with the analyzer from an arbitrary field type.
PayloadEncoder |
Mainly for use with the DelimitedPayloadTokenFilter, converts char buffers to BytesRef .
PayloadHelper |
Utility methods for encoding payloads.
PerFieldAnalyzerWrapper |
This analyzer is used to facilitate scenarios where different fields require different analysis
PersianAnalyzer |
PersianCharFilter |
CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.
PersianCharFilterFactory |
PersianNormalizationFilter |
PersianNormalizationFilterFactory |
PersianNormalizer |
Normalizer for Persian.
PersianStemFilter |
PersianStemFilterFactory |
PersianStemmer |
Stemmer for Persian.
PorterStemFilter |
Transforms the token stream as per the Porter stemming algorithm.
PorterStemFilterFactory |
PorterStemmer |
This class implements the stemming algorithm defined by a snowball script.
PortugueseAnalyzer |
PortugueseLightStemFilter |
PortugueseLightStemFilterFactory |
PortugueseLightStemmer |
Light Stemmer for Portuguese
PortugueseMinimalStemFilter |
PortugueseMinimalStemFilterFactory |
PortugueseMinimalStemmer |
Minimal Stemmer for Portuguese
PortugueseStemFilter |
PortugueseStemFilterFactory |
PortugueseStemmer |
Portuguese stemmer implementing the RSLP (Removedor de Sufixos da Lingua Portuguesa) algorithm.
PortugueseStemmer |
This class implements the stemming algorithm defined by a snowball script.
ProtectedTermFilter |
A ConditionalTokenFilter that only applies its wrapped filters to tokens that are not contained
in a protected set.
ProtectedTermFilterFactory |
QueryAutoStopWordAnalyzer |
An Analyzer used primarily at query time to wrap another analyzer and provide a layer of
protection which prevents very common words from being passed into queries.
RemoveDuplicatesTokenFilter |
A TokenFilter which filters out Tokens at the same position and Term text as the previous token
in the stream.
RemoveDuplicatesTokenFilterFactory |
ReversePathHierarchyTokenizer |
Tokenizer for domain-like hierarchies.
ReverseStringFilter |
Reverse token string, for example "country" => "yrtnuoc".
ReverseStringFilterFactory |
RollingCharBuffer |
Acts like a forever growing char[] as you read characters into it from the provided reader, but
internally it uses a circular buffer to only hold the characters that haven't been freed yet.
RomanianAnalyzer |
RomanianStemmer |
This class implements the stemming algorithm defined by a snowball script.
RSLPStemmerBase |
Base class for stemmers that use a set of RSLP-like stemming steps.
RSLPStemmerBase.Rule |
A basic rule, with no exceptions.
RSLPStemmerBase.RuleWithSetExceptions |
A rule with a set of whole-word exceptions.
RSLPStemmerBase.RuleWithSuffixExceptions |
A rule with a set of exceptional suffixes.
RSLPStemmerBase.Step |
A step containing a list of rules.
RussianAnalyzer |
RussianLightStemFilter |
RussianLightStemFilterFactory |
RussianLightStemmer |
Light Stemmer for Russian.
RussianStemmer |
This class implements the stemming algorithm defined by a snowball script.
ScandinavianFoldingFilter |
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
ScandinavianFoldingFilterFactory |
ScandinavianNormalizationFilter |
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded
variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
ScandinavianNormalizationFilterFactory |
ScandinavianNormalizer |
This Normalizer does the heavy lifting for a set of Scandinavian normalization filters,
normalizing use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa,
ao, ae, oe and oo) by transforming them to åÅæÆøØ.
ScandinavianNormalizer.Foldings |
List of possible foldings that can be used when configuring the filter
SegmentingTokenizerBase |
Breaks text into sentences with a BreakIterator and allows subclasses to decompose these
sentences into words.
SerbianAnalyzer |
SerbianNormalizationFilter |
Normalizes Serbian Cyrillic and Latin characters to "bald" Latin.
SerbianNormalizationFilterFactory |
SerbianNormalizationRegularFilter |
Normalizes Serbian Cyrillic to Latin.
SerbianStemmer |
This class implements the stemming algorithm defined by a snowball script.
SetKeywordMarkerFilter |
ShingleAnalyzerWrapper |
ShingleFilter |
A ShingleFilter constructs shingles (token n-grams) from a token stream.
ShingleFilterFactory |
SimpleAnalyzer |
SimplePatternSplitTokenizer |
This tokenizer uses a Lucene RegExp or (expert usage) a pre-built determinized Automaton , to locate tokens.
SimplePatternSplitTokenizerFactory |
SimplePatternTokenizer |
This tokenizer uses a Lucene RegExp or (expert usage) a pre-built determinized Automaton , to locate tokens.
SimplePatternTokenizerFactory |
SnowballFilter |
A filter that stems words using a Snowball-generated stemmer.
SnowballPorterFilterFactory |
SnowballProgram |
Base class for a snowball stemmer
SnowballStemmer |
Parent class of all snowball stemmers, which must implement stem
SolrSynonymParser |
Parser for the Solr synonyms format.
SoraniAnalyzer |
SoraniNormalizationFilter |
SoraniNormalizationFilterFactory |
SoraniNormalizer |
Normalizes the Unicode representation of Sorani text.
SoraniStemFilter |
SoraniStemFilterFactory |
SoraniStemmer |
Light stemmer for Sorani
SortingStrategy |
The strategy defining how a Hunspell dictionary should be loaded, with different tradeoffs.
SpanishAnalyzer |
SpanishLightStemFilter |
SpanishLightStemFilterFactory |
SpanishLightStemmer |
Light Stemmer for Spanish
SpanishMinimalStemFilter |
SpanishMinimalStemFilterFactory |
SpanishMinimalStemmer |
SpanishPluralStemFilter |
SpanishPluralStemFilterFactory |
SpanishPluralStemmer |
Plural Stemmer for Spanish
SpanishStemmer |
This class implements the stemming algorithm defined by a snowball script.
StemmerOverrideFilter |
Provides the ability to override any KeywordAttribute aware stemmer with custom
dictionary-based stemming.
StemmerOverrideFilter.Builder |
StemmerOverrideFilter.StemmerOverrideMap |
A read-only 4-byte FST backed map that allows fast case-insensitive key value lookups for
StemmerOverrideFilterFactory |
StemmerUtil |
Some commonly-used stemming functions
StopAnalyzer |
StopFilter |
Removes stop words from a token stream.
StopFilterFactory |
Suggester |
A generator for misspelled word corrections based on Hunspell flags.
SuggestionTimeoutException |
SwedishAnalyzer |
SwedishLightStemFilter |
SwedishLightStemFilterFactory |
SwedishLightStemmer |
Light Stemmer for Swedish.
SwedishMinimalStemFilter |
SwedishMinimalStemFilterFactory |
SwedishMinimalStemmer |
Minimal Stemmer for Swedish.
SwedishStemmer |
This class implements the stemming algorithm defined by a snowball script.
SynonymFilter |
SynonymFilterFactory |
SynonymGraphFilter |
Applies single- or multi-token synonyms from a SynonymMap to an incoming TokenStream , producing a fully correct graph output.
SynonymGraphFilterFactory |
SynonymMap |
A map of synonyms, keys and values are phrases.
SynonymMap.Builder |
Builds an FSTSynonymMap.
SynonymMap.Parser |
Abstraction for parsing synonym files.
TamilAnalyzer |
Analyzer for Tamil.
TamilStemmer |
This class implements the stemming algorithm defined by a snowball script.
TeeSinkTokenFilter |
This TokenFilter provides the ability to set aside attribute states that have already been
TeeSinkTokenFilter.SinkTokenStream |
TokenStream output from a tee.
TeluguAnalyzer |
Analyzer for Telugu.
TeluguNormalizationFilter |
TeluguNormalizationFilterFactory |
TeluguNormalizer |
Normalizer for Telugu.
TeluguStemFilter |
TeluguStemFilterFactory |
TeluguStemmer |
Stemmer for Telugu.
TermAndBoost |
Wraps a term and boost
TernaryTree |
Ternary Search Tree.
ThaiAnalyzer |
ThaiTokenizer |
ThaiTokenizerFactory |
TimeoutPolicy |
A strategy determining what to do when Hunspell API calls take too much time
TokenOffsetPayloadTokenFilter |
TokenOffsetPayloadTokenFilterFactory |
TrimFilter |
Trims leading and trailing whitespace from Tokens in the stream.
TrimFilterFactory |
TruncateTokenFilter |
A token filter for truncating the terms into a specific length.
TruncateTokenFilterFactory |
TurkishAnalyzer |
TurkishLowerCaseFilter |
Normalizes Turkish token text to lower case.
TurkishLowerCaseFilterFactory |
TurkishStemmer |
This class implements the stemming algorithm defined by a snowball script.
TypeAsPayloadTokenFilter |
TypeAsPayloadTokenFilterFactory |
TypeAsSynonymFilter |
TypeAsSynonymFilterFactory |
TypeTokenFilter |
Removes tokens whose types appear in a set of blocked types from a token stream.
TypeTokenFilterFactory |
UAX29URLEmailAnalyzer |
UAX29URLEmailTokenizer |
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified
in Unicode Standard Annex #29 URLs and email
addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailTokenizerFactory |
UAX29URLEmailTokenizerImpl |
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
UnicodeProps |
This file contains unicode properties used by various CharTokenizer s.
UnicodeWhitespaceAnalyzer |
UnicodeWhitespaceTokenizer |
A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.
UpperCaseFilter |
Normalizes token text to UPPER CASE.
UpperCaseFilterFactory |
WhitespaceAnalyzer |
WhitespaceTokenizer |
WhitespaceTokenizerFactory |
WikipediaTokenizer |
Extension of StandardTokenizer that is aware of Wikipedia syntax.
WikipediaTokenizerFactory |
Word2VecModel |
Word2VecModel is a class representing the parsed Word2Vec model containing the vectors for each
word in dictionary
Word2VecSynonymFilter |
Applies single-token synonyms from a Word2Vec trained network to an incoming TokenStream .
Word2VecSynonymFilterFactory |
Word2VecSynonymProvider |
The Word2VecSynonymProvider generates the list of sysnonyms of a term.
Word2VecSynonymProviderFactory |
Supply Word2Vec Word2VecSynonymProvider cache avoiding that multiple instances of
Word2VecSynonymFilterFactory will instantiate multiple instances of the same SynonymProvider.
WordDelimiterFilter |
WordDelimiterFilterFactory |
WordDelimiterGraphFilter |
Splits words into subwords and performs optional transformations on subword groups, producing a
correct token graph so that e.g.
WordDelimiterGraphFilterFactory |
WordDelimiterIterator |
A BreakIterator-like API for iterating over subwords in text, according to
WordDelimiterGraphFilter rules.
WordFormGenerator |
WordnetSynonymParser |
Parser for wordnet prolog format
YiddishStemmer |
This class implements the stemming algorithm defined by a snowball script.