All Classes Interface Summary Class Summary Enum Summary Exception Summary
Class |
Description |
AbstractEncoder |
Base class for payload encoders.
|
AbstractWordsFileFilterFactory |
Abstract parent class for analysis factories that accept a stopwords file as input.
|
AffixedWord |
An object representing the analysis result of a simple (non-compound) word
|
AffixedWord.Affix |
An object representing a prefix or a suffix applied to a word stem
|
Among |
Internal class used by Snowball stemmers
|
ApostropheFilter |
Strips all characters after an apostrophe (including the apostrophe itself).
|
ApostropheFilterFactory |
|
ArabicAnalyzer |
|
ArabicNormalizationFilter |
|
ArabicNormalizationFilterFactory |
|
ArabicNormalizer |
Normalizer for Arabic.
|
ArabicStemFilter |
|
ArabicStemFilterFactory |
|
ArabicStemmer |
Stemmer for Arabic.
|
ArabicStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
ArmenianAnalyzer |
|
ArmenianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
ASCIIFoldingFilter |
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the
first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one
exists.
|
ASCIIFoldingFilterFactory |
|
BaseCharFilter |
|
BasqueAnalyzer |
|
BasqueStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
BengaliAnalyzer |
Analyzer for Bengali.
|
BengaliNormalizationFilter |
|
BengaliNormalizationFilterFactory |
|
BengaliNormalizer |
Normalizer for Bengali.
|
BengaliStemFilter |
|
BengaliStemFilterFactory |
|
BengaliStemmer |
Stemmer for Bengali.
|
BrazilianAnalyzer |
Analyzer for Brazilian Portuguese language.
|
BrazilianStemFilter |
|
BrazilianStemFilterFactory |
|
BrazilianStemmer |
A stemmer for Brazilian Portuguese words.
|
BulgarianAnalyzer |
|
BulgarianStemFilter |
|
BulgarianStemFilterFactory |
|
BulgarianStemmer |
Light Stemmer for Bulgarian.
|
ByteVector |
This class implements a simple byte vector with access to the underlying array.
|
CapitalizationFilter |
A filter to apply normal capitalization rules to Tokens.
|
CapitalizationFilterFactory |
|
CatalanAnalyzer |
|
CatalanStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
CharArrayIterator |
|
CharTokenizer |
An abstract base class for simple, character-oriented tokenizers.
|
CharVector |
This class implements a simple char vector with access to the underlying array.
|
CJKAnalyzer |
|
CJKBigramFilter |
Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.
|
CJKBigramFilterFactory |
|
CJKWidthCharFilter |
A CharFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
CJKWidthCharFilterFactory |
|
CJKWidthFilter |
A TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
CJKWidthFilterFactory |
|
ClassicAnalyzer |
|
ClassicFilter |
|
ClassicFilterFactory |
|
ClassicTokenizer |
A grammar-based tokenizer constructed with JFlex
|
ClassicTokenizerFactory |
|
CodepointCountFilter |
Removes words that are too long or too short from the stream.
|
CodepointCountFilterFactory |
|
CollatedTermAttributeImpl |
Extension of CharTermAttributeImpl that encodes the term text as a binary Unicode
collation key instead of as UTF-8 bytes.
|
CollationAttributeFactory |
Converts each token into its CollationKey , and then encodes the bytes as an
index term.
|
CollationDocValuesField |
|
CollationKeyAnalyzer |
|
CommonGramsFilter |
Construct bigrams for frequently occurring terms while indexing.
|
CommonGramsFilterFactory |
|
CommonGramsQueryFilter |
Wrap a CommonGramsFilter optimizing phrase queries by only returning single words when they are
not a member of a bigram.
|
CommonGramsQueryFilterFactory |
|
CompoundWordTokenFilterBase |
Base class for decomposition token filters.
|
ConcatenateGraphFilter |
Concatenates/Joins every incoming token with a separator into one output token for every path
through the token stream (which is a graph).
|
ConcatenateGraphFilter.BytesRefBuilderTermAttribute |
Attribute providing access to the term builder and UTF-16 conversion
|
ConcatenateGraphFilter.BytesRefBuilderTermAttributeImpl |
|
ConcatenateGraphFilterFactory |
|
ConcatenatingTokenStream |
A TokenStream that takes an array of input TokenStreams as sources, and concatenates them
together.
|
ConditionalTokenFilter |
Allows skipping TokenFilters based on the current set of attributes.
|
ConditionalTokenFilterFactory |
|
CSVUtil |
Utility class for parsing CSV text
|
CustomAnalyzer |
A general-purpose Analyzer that can be created with a builder-style API.
|
CustomAnalyzer.Builder |
|
CustomAnalyzer.ConditionBuilder |
|
CzechAnalyzer |
|
CzechStemFilter |
|
CzechStemFilterFactory |
|
CzechStemmer |
Light Stemmer for Czech.
|
DanishAnalyzer |
|
DanishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
DateRecognizerFilter |
Filters all tokens that cannot be parsed to a date, using the provided DateFormat .
|
DateRecognizerFilterFactory |
|
DecimalDigitFilter |
Folds all Unicode digits in [:General_Category=Decimal_Number:] to Basic Latin digits
(0-9 ).
|
DecimalDigitFilterFactory |
|
DelimitedBoostTokenFilter |
Characters before the delimiter are the "token", those after are the boost.
|
DelimitedBoostTokenFilterFactory |
|
DelimitedPayloadTokenFilter |
Characters before the delimiter are the "token", those after are the payload.
|
DelimitedPayloadTokenFilterFactory |
|
DelimitedTermFrequencyTokenFilter |
Characters before the delimiter are the "token", the textual integer after is the term frequency.
|
DelimitedTermFrequencyTokenFilterFactory |
|
DictEntries |
An object representing homonym dictionary entries.
|
DictEntry |
An object representing *.dic file entry with its word, flags and morphological data.
|
Dictionary |
In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.
|
DictionaryCompoundWordTokenFilter |
A TokenFilter that decomposes compound words found in many
Germanic languages.
|
DictionaryCompoundWordTokenFilterFactory |
|
Dl4jModelReader |
Dl4jModelReader reads the file generated by the library Deeplearning4j and provide a
Word2VecModel with normalized vectors
|
DropIfFlaggedFilter |
Allows Tokens with a given combination of flags to be dropped.
|
DropIfFlaggedFilterFactory |
Provides a filter that will drop tokens matching a set of flags.
|
DutchAnalyzer |
|
DutchStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
EdgeNGramFilterFactory |
|
EdgeNGramTokenFilter |
Tokenizes the given token into n-grams of given size(s).
|
EdgeNGramTokenizer |
Tokenizes the input from an edge into n-grams of given size(s).
|
EdgeNGramTokenizerFactory |
|
ElisionFilter |
|
ElisionFilterFactory |
|
EmptyTokenStream |
An always exhausted token stream.
|
EnglishAnalyzer |
|
EnglishMinimalStemFilter |
|
EnglishMinimalStemFilterFactory |
|
EnglishMinimalStemmer |
Minimal plural stemmer for English.
|
EnglishPossessiveFilter |
TokenFilter that removes possessives (trailing 's) from words.
|
EnglishPossessiveFilterFactory |
|
EnglishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
EntrySuggestion |
|
EstonianAnalyzer |
|
EstonianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
FilesystemResourceLoader |
Simple ResourceLoader that opens resource files from the local file system, optionally
resolving against a base directory.
|
FingerprintFilter |
Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of
input tokens.
|
FingerprintFilterFactory |
|
FinnishAnalyzer |
|
FinnishLightStemFilter |
|
FinnishLightStemFilterFactory |
|
FinnishLightStemmer |
Light Stemmer for Finnish.
|
FinnishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
FixBrokenOffsetsFilter |
Deprecated.
|
FixBrokenOffsetsFilterFactory |
Deprecated. |
FixedShingleFilter |
A FixedShingleFilter constructs shingles (token n-grams) from a token stream.
|
FixedShingleFilterFactory |
|
FlattenGraphFilter |
Converts an incoming graph token stream, such as one from SynonymGraphFilter , into a flat
form so that all nodes form a single linear chain with no side paths.
|
FlattenGraphFilterFactory |
|
FloatEncoder |
Encode a character array Float as a BytesRef .
|
FragmentChecker |
An oracle for quickly checking that a specific part of a word can never be a valid word.
|
FrenchAnalyzer |
|
FrenchLightStemFilter |
|
FrenchLightStemFilterFactory |
|
FrenchLightStemmer |
Light Stemmer for French.
|
FrenchMinimalStemFilter |
|
FrenchMinimalStemFilterFactory |
|
FrenchMinimalStemmer |
Light Stemmer for French.
|
FrenchStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
GalicianAnalyzer |
|
GalicianMinimalStemFilter |
|
GalicianMinimalStemFilterFactory |
|
GalicianMinimalStemmer |
Minimal Stemmer for Galician
|
GalicianStemFilter |
|
GalicianStemFilterFactory |
|
GalicianStemmer |
Galician stemmer implementing "Regras do lematizador para o galego".
|
German2Stemmer |
This class implements the stemming algorithm defined by a snowball script.
|
GermanAnalyzer |
|
GermanLightStemFilter |
|
GermanLightStemFilterFactory |
|
GermanLightStemmer |
Light Stemmer for German.
|
GermanMinimalStemFilter |
|
GermanMinimalStemFilterFactory |
|
GermanMinimalStemmer |
Minimal Stemmer for German.
|
GermanNormalizationFilter |
|
GermanNormalizationFilterFactory |
|
GermanStemFilter |
|
GermanStemFilterFactory |
|
GermanStemmer |
A stemmer for German words.
|
GermanStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
GreekAnalyzer |
|
GreekLowerCaseFilter |
Normalizes token text to lower case, removes some Greek diacritics, and standardizes final sigma
to sigma.
|
GreekLowerCaseFilterFactory |
|
GreekStemFilter |
|
GreekStemFilterFactory |
|
GreekStemmer |
A stemmer for Greek words, according to: Development of a Stemmer for the Greek Language.
Georgios Ntais
|
GreekStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
HindiAnalyzer |
Analyzer for Hindi.
|
HindiNormalizationFilter |
|
HindiNormalizationFilterFactory |
|
HindiNormalizer |
Normalizer for Hindi.
|
HindiStemFilter |
|
HindiStemFilterFactory |
|
HindiStemmer |
Light Stemmer for Hindi.
|
HindiStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
HTMLStripCharFilter |
A CharFilter that wraps another Reader and attempts to strip out HTML constructs.
|
HTMLStripCharFilterFactory |
|
HungarianAnalyzer |
|
HungarianLightStemFilter |
|
HungarianLightStemFilterFactory |
|
HungarianLightStemmer |
Light Stemmer for Hungarian.
|
HungarianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
Hunspell |
A spell checker based on Hunspell dictionaries.
|
HunspellStemFilter |
TokenFilter that uses hunspell affix rules and words to stem tokens.
|
HunspellStemFilterFactory |
|
Hyphen |
This class represents a hyphen.
|
HyphenatedWordsFilter |
When the plain text is extracted from documents, we will often have many words hyphenated and
broken into two lines.
|
HyphenatedWordsFilterFactory |
|
Hyphenation |
This class represents a hyphenated word.
|
HyphenationCompoundWordTokenFilter |
A TokenFilter that decomposes compound words found in many
Germanic languages.
|
HyphenationCompoundWordTokenFilterFactory |
|
HyphenationTree |
This tree structure stores the hyphenation patterns in an efficient way for fast lookup.
|
IdentityEncoder |
Does nothing other than convert the char array to a byte array using the specified encoding.
|
IndicNormalizationFilter |
|
IndicNormalizationFilterFactory |
|
IndicNormalizer |
Normalizes the Unicode representation of text in Indian languages.
|
IndonesianAnalyzer |
Analyzer for Indonesian (Bahasa)
|
IndonesianStemFilter |
|
IndonesianStemFilterFactory |
|
IndonesianStemmer |
Stemmer for Indonesian.
|
IndonesianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
IntegerEncoder |
Encode a character array Integer as a BytesRef .
|
IrishAnalyzer |
|
IrishLowerCaseFilter |
Normalises token text to lower case, handling t-prothesis and n-eclipsis (i.e., that 'nAthair'
should become 'n-athair')
|
IrishLowerCaseFilterFactory |
|
IrishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
ItalianAnalyzer |
|
ItalianLightStemFilter |
|
ItalianLightStemFilterFactory |
|
ItalianLightStemmer |
Light Stemmer for Italian.
|
ItalianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
KeepWordFilter |
A TokenFilter that only keeps tokens with text contained in the required words.
|
KeepWordFilterFactory |
|
KeywordAnalyzer |
"Tokenizes" the entire stream as a single token.
|
KeywordMarkerFilter |
|
KeywordMarkerFilterFactory |
|
KeywordRepeatFilter |
This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other
words once with KeywordAttribute.setKeyword(boolean) set to true and once
set to false .
|
KeywordRepeatFilterFactory |
|
KeywordTokenizer |
Emits the entire input as a single token.
|
KeywordTokenizerFactory |
|
KpStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
KStemFilter |
A high-performance kstem filter for english.
|
KStemFilterFactory |
|
KStemmer |
This class implements the Kstem algorithm
|
LatvianAnalyzer |
|
LatvianStemFilter |
|
LatvianStemFilterFactory |
|
LatvianStemmer |
Light stemmer for Latvian.
|
LengthFilter |
Removes words that are too long or too short from the stream.
|
LengthFilterFactory |
|
LetterTokenizer |
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
LetterTokenizerFactory |
|
LimitTokenCountAnalyzer |
This Analyzer limits the number of tokens while indexing.
|
LimitTokenCountFilter |
This TokenFilter limits the number of tokens while indexing.
|
LimitTokenCountFilterFactory |
|
LimitTokenOffsetFilter |
Lets all tokens pass through until it sees one with a start offset <= a configured limit,
which won't pass and ends the stream.
|
LimitTokenOffsetFilterFactory |
|
LimitTokenPositionFilter |
This TokenFilter limits its emitted tokens to those with positions that are not greater than the
configured limit.
|
LimitTokenPositionFilterFactory |
|
LithuanianAnalyzer |
|
LithuanianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
LovinsStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
LowerCaseFilter |
Normalizes token text to lower case.
|
LowerCaseFilterFactory |
|
MappingCharFilter |
Simplistic CharFilter that applies the mappings contained in a NormalizeCharMap
to the character stream, and correcting the resulting changes to the offsets.
|
MappingCharFilterFactory |
|
MinHashFilter |
Generate min hash tokens from an incoming stream of tokens.
|
MinHashFilterFactory |
|
NepaliAnalyzer |
Analyzer for Nepali.
|
NepaliStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
NGramFilterFactory |
|
NGramFragmentChecker |
A FragmentChecker based on all character n-grams possible in a certain language, keeping
them in a relatively memory-efficient, but probabilistic data structure.
|
NGramFragmentChecker.NGramConsumer |
A callback for n-gram ranges in words
|
NGramTokenFilter |
Tokenizes the input into n-grams of the given size(s).
|
NGramTokenizer |
Tokenizes the input into n-grams of the given size(s).
|
NGramTokenizerFactory |
|
NormalizeCharMap |
|
NormalizeCharMap.Builder |
Builds an NormalizeCharMap.
|
NorwegianAnalyzer |
|
NorwegianLightStemFilter |
|
NorwegianLightStemFilterFactory |
|
NorwegianLightStemmer |
Light Stemmer for Norwegian.
|
NorwegianMinimalStemFilter |
|
NorwegianMinimalStemFilterFactory |
|
NorwegianMinimalStemmer |
Minimal Stemmer for Norwegian Bokmål (no-nb) and Nynorsk (no-nn)
|
NorwegianNormalizationFilter |
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded
variants (ae, oe, aa) by transforming them to åÅæÆøØ.
|
NorwegianNormalizationFilterFactory |
|
NorwegianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
NumericPayloadTokenFilter |
|
NumericPayloadTokenFilterFactory |
|
OpenStringBuilder |
A StringBuilder that allows one to access the array.
|
PathHierarchyTokenizer |
Tokenizer for path-like hierarchies.
|
PathHierarchyTokenizerFactory |
|
PatternCaptureGroupFilterFactory |
|
PatternCaptureGroupTokenFilter |
CaptureGroup uses Java regexes to emit multiple tokens - one for each capture group in one or
more patterns.
|
PatternConsumer |
This interface is used to connect the XML pattern file parser to the hyphenation tree.
|
PatternKeywordMarkerFilter |
|
PatternParser |
A SAX document handler to read and parse hyphenation patterns from a XML file.
|
PatternReplaceCharFilter |
CharFilter that uses a regular expression for the target of replace string.
|
PatternReplaceCharFilterFactory |
|
PatternReplaceFilter |
A TokenFilter which applies a Pattern to each token in the stream, replacing match occurrences
with the specified replacement string.
|
PatternReplaceFilterFactory |
|
PatternTokenizer |
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.
|
PatternTokenizerFactory |
|
PatternTypingFilter |
Set a type attribute to a parameterized value when tokens are matched by any of a several regex
patterns.
|
PatternTypingFilter.PatternTypingRule |
Value holding class for pattern typing rules.
|
PatternTypingFilterFactory |
Provides a filter that will analyze tokens with the analyzer from an arbitrary field type.
|
PayloadEncoder |
Mainly for use with the DelimitedPayloadTokenFilter, converts char buffers to BytesRef .
|
PayloadHelper |
Utility methods for encoding payloads.
|
PerFieldAnalyzerWrapper |
This analyzer is used to facilitate scenarios where different fields require different analysis
techniques.
|
PersianAnalyzer |
|
PersianCharFilter |
CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.
|
PersianCharFilterFactory |
|
PersianNormalizationFilter |
|
PersianNormalizationFilterFactory |
|
PersianNormalizer |
Normalizer for Persian.
|
PersianStemFilter |
|
PersianStemFilterFactory |
|
PersianStemmer |
Stemmer for Persian.
|
PorterStemFilter |
Transforms the token stream as per the Porter stemming algorithm.
|
PorterStemFilterFactory |
|
PorterStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
PortugueseAnalyzer |
|
PortugueseLightStemFilter |
|
PortugueseLightStemFilterFactory |
|
PortugueseLightStemmer |
Light Stemmer for Portuguese
|
PortugueseMinimalStemFilter |
|
PortugueseMinimalStemFilterFactory |
|
PortugueseMinimalStemmer |
Minimal Stemmer for Portuguese
|
PortugueseStemFilter |
|
PortugueseStemFilterFactory |
|
PortugueseStemmer |
Portuguese stemmer implementing the RSLP (Removedor de Sufixos da Lingua Portuguesa) algorithm.
|
PortugueseStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
ProtectedTermFilter |
A ConditionalTokenFilter that only applies its wrapped filters to tokens that are not contained
in a protected set.
|
ProtectedTermFilterFactory |
|
QueryAutoStopWordAnalyzer |
An Analyzer used primarily at query time to wrap another analyzer and provide a layer of
protection which prevents very common words from being passed into queries.
|
RemoveDuplicatesTokenFilter |
A TokenFilter which filters out Tokens at the same position and Term text as the previous token
in the stream.
|
RemoveDuplicatesTokenFilterFactory |
|
ReversePathHierarchyTokenizer |
Tokenizer for domain-like hierarchies.
|
ReverseStringFilter |
Reverse token string, for example "country" => "yrtnuoc".
|
ReverseStringFilterFactory |
|
RollingCharBuffer |
Acts like a forever growing char[] as you read characters into it from the provided reader, but
internally it uses a circular buffer to only hold the characters that haven't been freed yet.
|
RomanianAnalyzer |
|
RomanianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
RSLPStemmerBase |
Base class for stemmers that use a set of RSLP-like stemming steps.
|
RSLPStemmerBase.Rule |
A basic rule, with no exceptions.
|
RSLPStemmerBase.RuleWithSetExceptions |
A rule with a set of whole-word exceptions.
|
RSLPStemmerBase.RuleWithSuffixExceptions |
A rule with a set of exceptional suffixes.
|
RSLPStemmerBase.Step |
A step containing a list of rules.
|
RussianAnalyzer |
|
RussianLightStemFilter |
|
RussianLightStemFilterFactory |
|
RussianLightStemmer |
Light Stemmer for Russian.
|
RussianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
ScandinavianFoldingFilter |
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
|
ScandinavianFoldingFilterFactory |
|
ScandinavianNormalizationFilter |
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded
variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
|
ScandinavianNormalizationFilterFactory |
|
ScandinavianNormalizer |
This Normalizer does the heavy lifting for a set of Scandinavian normalization filters,
normalizing use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa,
ao, ae, oe and oo) by transforming them to åÅæÆøØ.
|
ScandinavianNormalizer.Foldings |
List of possible foldings that can be used when configuring the filter
|
SegmentingTokenizerBase |
Breaks text into sentences with a BreakIterator and allows subclasses to decompose these
sentences into words.
|
SerbianAnalyzer |
|
SerbianNormalizationFilter |
Normalizes Serbian Cyrillic and Latin characters to "bald" Latin.
|
SerbianNormalizationFilterFactory |
|
SerbianNormalizationRegularFilter |
Normalizes Serbian Cyrillic to Latin.
|
SerbianStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
SetKeywordMarkerFilter |
|
ShingleAnalyzerWrapper |
|
ShingleFilter |
A ShingleFilter constructs shingles (token n-grams) from a token stream.
|
ShingleFilterFactory |
|
SimpleAnalyzer |
|
SimplePatternSplitTokenizer |
This tokenizer uses a Lucene RegExp or (expert usage) a pre-built determinized Automaton , to locate tokens.
|
SimplePatternSplitTokenizerFactory |
|
SimplePatternTokenizer |
This tokenizer uses a Lucene RegExp or (expert usage) a pre-built determinized Automaton , to locate tokens.
|
SimplePatternTokenizerFactory |
|
SnowballFilter |
A filter that stems words using a Snowball-generated stemmer.
|
SnowballPorterFilterFactory |
|
SnowballProgram |
Base class for a snowball stemmer
|
SnowballStemmer |
Parent class of all snowball stemmers, which must implement stem
|
SolrSynonymParser |
Parser for the Solr synonyms format.
|
SoraniAnalyzer |
|
SoraniNormalizationFilter |
|
SoraniNormalizationFilterFactory |
|
SoraniNormalizer |
Normalizes the Unicode representation of Sorani text.
|
SoraniStemFilter |
|
SoraniStemFilterFactory |
|
SoraniStemmer |
Light stemmer for Sorani
|
SortingStrategy |
The strategy defining how a Hunspell dictionary should be loaded, with different tradeoffs.
|
SpanishAnalyzer |
|
SpanishLightStemFilter |
|
SpanishLightStemFilterFactory |
|
SpanishLightStemmer |
Light Stemmer for Spanish
|
SpanishMinimalStemFilter |
Deprecated.
|
SpanishMinimalStemFilterFactory |
Deprecated.
|
SpanishMinimalStemmer |
Deprecated.
|
SpanishPluralStemFilter |
|
SpanishPluralStemFilterFactory |
|
SpanishPluralStemmer |
Plural Stemmer for Spanish
|
SpanishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
StemmerOverrideFilter |
Provides the ability to override any KeywordAttribute aware stemmer with custom
dictionary-based stemming.
|
StemmerOverrideFilter.Builder |
|
StemmerOverrideFilter.StemmerOverrideMap |
A read-only 4-byte FST backed map that allows fast case-insensitive key value lookups for
StemmerOverrideFilter
|
StemmerOverrideFilterFactory |
|
StemmerUtil |
Some commonly-used stemming functions
|
StopAnalyzer |
|
StopFilter |
Removes stop words from a token stream.
|
StopFilterFactory |
|
Suggester |
A generator for misspelled word corrections based on Hunspell flags.
|
SuggestionTimeoutException |
|
SwedishAnalyzer |
|
SwedishLightStemFilter |
|
SwedishLightStemFilterFactory |
|
SwedishLightStemmer |
Light Stemmer for Swedish.
|
SwedishMinimalStemFilter |
|
SwedishMinimalStemFilterFactory |
|
SwedishMinimalStemmer |
Minimal Stemmer for Swedish.
|
SwedishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
SynonymFilter |
Deprecated.
|
SynonymFilterFactory |
Deprecated.
|
SynonymGraphFilter |
Applies single- or multi-token synonyms from a SynonymMap to an incoming TokenStream , producing a fully correct graph output.
|
SynonymGraphFilterFactory |
|
SynonymMap |
A map of synonyms, keys and values are phrases.
|
SynonymMap.Builder |
Builds an FSTSynonymMap.
|
SynonymMap.Parser |
Abstraction for parsing synonym files.
|
TamilAnalyzer |
Analyzer for Tamil.
|
TamilStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
TeeSinkTokenFilter |
This TokenFilter provides the ability to set aside attribute states that have already been
analyzed.
|
TeeSinkTokenFilter.SinkTokenStream |
TokenStream output from a tee.
|
TeluguAnalyzer |
Analyzer for Telugu.
|
TeluguNormalizationFilter |
|
TeluguNormalizationFilterFactory |
|
TeluguNormalizer |
Normalizer for Telugu.
|
TeluguStemFilter |
|
TeluguStemFilterFactory |
|
TeluguStemmer |
Stemmer for Telugu.
|
TermAndBoost |
Wraps a term and boost
|
TernaryTree |
Ternary Search Tree.
|
ThaiAnalyzer |
|
ThaiTokenizer |
|
ThaiTokenizerFactory |
|
TimeoutPolicy |
A strategy determining what to do when Hunspell API calls take too much time
|
TokenOffsetPayloadTokenFilter |
|
TokenOffsetPayloadTokenFilterFactory |
|
TrimFilter |
Trims leading and trailing whitespace from Tokens in the stream.
|
TrimFilterFactory |
|
TruncateTokenFilter |
A token filter for truncating the terms into a specific length.
|
TruncateTokenFilterFactory |
|
TurkishAnalyzer |
|
TurkishLowerCaseFilter |
Normalizes Turkish token text to lower case.
|
TurkishLowerCaseFilterFactory |
|
TurkishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|
TypeAsPayloadTokenFilter |
|
TypeAsPayloadTokenFilterFactory |
|
TypeAsSynonymFilter |
|
TypeAsSynonymFilterFactory |
|
TypeTokenFilter |
Removes tokens whose types appear in a set of blocked types from a token stream.
|
TypeTokenFilterFactory |
|
UAX29URLEmailAnalyzer |
|
UAX29URLEmailTokenizer |
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified
in Unicode Standard Annex #29 URLs and email
addresses are also tokenized according to the relevant RFCs.
|
UAX29URLEmailTokenizerFactory |
|
UAX29URLEmailTokenizerImpl |
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
UnicodeProps |
This file contains unicode properties used by various CharTokenizer s.
|
UnicodeWhitespaceAnalyzer |
|
UnicodeWhitespaceTokenizer |
A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
UpperCaseFilter |
Normalizes token text to UPPER CASE.
|
UpperCaseFilterFactory |
|
WhitespaceAnalyzer |
|
WhitespaceTokenizer |
|
WhitespaceTokenizerFactory |
|
WikipediaTokenizer |
Extension of StandardTokenizer that is aware of Wikipedia syntax.
|
WikipediaTokenizerFactory |
|
Word2VecModel |
Word2VecModel is a class representing the parsed Word2Vec model containing the vectors for each
word in dictionary
|
Word2VecSynonymFilter |
Applies single-token synonyms from a Word2Vec trained network to an incoming TokenStream .
|
Word2VecSynonymFilterFactory |
|
Word2VecSynonymProvider |
The Word2VecSynonymProvider generates the list of sysnonyms of a term.
|
Word2VecSynonymProviderFactory |
Supply Word2Vec Word2VecSynonymProvider cache avoiding that multiple instances of
Word2VecSynonymFilterFactory will instantiate multiple instances of the same SynonymProvider.
|
WordDelimiterFilter |
Deprecated.
|
WordDelimiterFilterFactory |
Deprecated.
|
WordDelimiterGraphFilter |
Splits words into subwords and performs optional transformations on subword groups, producing a
correct token graph so that e.g.
|
WordDelimiterGraphFilterFactory |
|
WordDelimiterIterator |
A BreakIterator-like API for iterating over subwords in text, according to
WordDelimiterGraphFilter rules.
|
WordFormGenerator |
|
WordnetSynonymParser |
Parser for wordnet prolog format
|
YiddishStemmer |
This class implements the stemming algorithm defined by a snowball script.
|