Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.bg |
Analyzer for Bulgarian.
|
org.apache.lucene.analysis.br |
Analyzer for Brazilian Portuguese.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
|
org.apache.lucene.analysis.cn |
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.compound |
A filter that decomposes compound words you find in many Germanic
languages into the word parts.
|
org.apache.lucene.analysis.cz |
Analyzer for Czech.
|
org.apache.lucene.analysis.de |
Analyzer for German.
|
org.apache.lucene.analysis.el |
Analyzer for Greek.
|
org.apache.lucene.analysis.en |
Analyzer for English.
|
org.apache.lucene.analysis.es |
Analyzer for Spanish.
|
org.apache.lucene.analysis.fa |
Analyzer for Persian.
|
org.apache.lucene.analysis.fi |
Analyzer for Finnish.
|
org.apache.lucene.analysis.fr |
Analyzer for French.
|
org.apache.lucene.analysis.ga |
Analysis for Irish.
|
org.apache.lucene.analysis.gl |
Analyzer for Galician.
|
org.apache.lucene.analysis.hi |
Analyzer for Hindi.
|
org.apache.lucene.analysis.hu |
Analyzer for Hungarian.
|
org.apache.lucene.analysis.hunspell |
Stemming TokenFilter using a Java implementation of the
Hunspell stemming algorithm.
|
org.apache.lucene.analysis.icu |
Analysis components based on ICU
|
org.apache.lucene.analysis.id |
Analyzer for Indonesian.
|
org.apache.lucene.analysis.in |
Analysis components for Indian languages.
|
org.apache.lucene.analysis.it |
Analyzer for Italian.
|
org.apache.lucene.analysis.ja |
Analyzer for Japanese.
|
org.apache.lucene.analysis.lv |
Analyzer for Latvian.
|
org.apache.lucene.analysis.miscellaneous |
Miscellaneous TokenStreams
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.nl |
Analyzer for Dutch.
|
org.apache.lucene.analysis.no |
Analyzer for Norwegian.
|
org.apache.lucene.analysis.payloads |
Provides various convenience classes for creating payloads on Tokens.
|
org.apache.lucene.analysis.phonetic |
Analysis components for phonetic search.
|
org.apache.lucene.analysis.position |
Filter for assigning position increments.
|
org.apache.lucene.analysis.pt |
Analyzer for Portuguese.
|
org.apache.lucene.analysis.reverse |
Filter to reverse token text.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.shingle |
Word n-gram filters
|
org.apache.lucene.analysis.snowball |
TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.standard |
Standards-based analyzers implemented with JFlex.
|
org.apache.lucene.analysis.stempel |
Stempel: Algorithmic Stemmer
|
org.apache.lucene.analysis.sv |
Analyzer for Swedish.
|
org.apache.lucene.analysis.synonym |
Analysis components for Synonyms.
|
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.analysis.tr |
Analyzer for Turkish.
|
org.apache.lucene.collation |
CollationKeyFilter
converts each token into its binary CollationKey using the
provided Collator , and then encode the CollationKey
as a String using
IndexableBinaryStringTools , to allow it to be
stored as an index term. |
org.apache.lucene.facet.enhancements |
Enhanced category features
Mechanisms for addition of enhanced category features.
|
org.apache.lucene.facet.enhancements.association |
Association category enhancements
A
CategoryEnhancement
for adding associations data to the index (categories with
AssociationProperty 's). |
org.apache.lucene.facet.index.streaming |
Expert: attributes streaming definition for indexing facets
Steaming of facets attributes is a low level indexing interface with Lucene indexing.
|
org.apache.lucene.queryParser |
A simple query parser implemented with JavaCC.
|
org.apache.lucene.search.highlight |
The highlight package contains classes to provide "keyword in context" features
typically used to highlight search terms in the text of results pages.
|
Modifier and Type | Class and Description |
---|---|
class |
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CachingTokenFilter
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.
|
class |
ISOLatin1AccentFilter
Deprecated.
If you build a new index, use
ASCIIFoldingFilter
which covers a superset of Latin 1.
This class is included for use with existing
indexes and will be removed in a future release (possibly Lucene 4.0). |
class |
KeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
LengthFilter
Removes words that are too long or too short from the stream.
|
class |
LimitTokenCountFilter
This TokenFilter limits the number of tokens while indexing.
|
class |
LookaheadTokenFilter<T extends LookaheadTokenFilter.Position>
An abstract TokenFilter to make it easier to build graph
token filters requiring some lookahead.
|
class |
LowerCaseFilter
Normalizes token text to lower case.
|
class |
MockFixedLengthPayloadFilter
TokenFilter that adds random fixed-length payloads.
|
class |
MockGraphTokenFilter
Randomly inserts overlapped (posInc=0) tokens with
posLength sometimes > 1.
|
class |
MockHoleInjectingTokenFilter |
class |
MockRandomLookaheadTokenFilter
Uses
LookaheadTokenFilter to randomly peek at future tokens. |
class |
MockVariableLengthPayloadFilter
TokenFilter that adds random variable-length payloads.
|
class |
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.
|
class |
StopFilter
Removes stop words from a token stream.
|
class |
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states
that have already been analyzed.
|
class |
TypeTokenFilter
Removes tokens whose types appear in a set of blocked types from a token stream.
|
class |
ValidatingTokenFilter
A TokenFilter that checks consistency of the tokens (eg
offsets are consistent with one another).
|
Modifier and Type | Class and Description |
---|---|
class |
ArabicNormalizationFilter
A
TokenFilter that applies ArabicNormalizer to normalize the orthography. |
class |
ArabicStemFilter
A
TokenFilter that applies ArabicStemmer to stem Arabic words.. |
Modifier and Type | Class and Description |
---|---|
class |
BulgarianStemFilter
A
TokenFilter that applies BulgarianStemmer to stem Bulgarian
words. |
Modifier and Type | Class and Description |
---|---|
class |
BrazilianStemFilter
A
TokenFilter that applies BrazilianStemmer . |
Modifier and Type | Class and Description |
---|---|
class |
CJKBigramFilter
Forms bigrams of CJK terms that are generated from StandardTokenizer
or ICUTokenizer.
|
class |
CJKWidthFilter
A
TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
NOTE: this filter can be viewed as a (practical) subset of NFKC/NFKD
Unicode normalization. |
Modifier and Type | Class and Description |
---|---|
class |
ChineseFilter
Deprecated.
Use
StopFilter instead, which has the same functionality.
This filter will be removed in Lucene 5.0 |
Modifier and Type | Class and Description |
---|---|
class |
WordTokenFilter
A
TokenFilter that breaks sentences into words. |
Modifier and Type | Class and Description |
---|---|
class |
CompoundWordTokenFilterBase
Base class for decomposition token filters.
|
class |
DictionaryCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
class |
HyphenationCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
Modifier and Type | Class and Description |
---|---|
class |
CzechStemFilter
A
TokenFilter that applies CzechStemmer to stem Czech words. |
Modifier and Type | Class and Description |
---|---|
class |
GermanLightStemFilter
A
TokenFilter that applies GermanLightStemmer to stem German
words. |
class |
GermanMinimalStemFilter
A
TokenFilter that applies GermanMinimalStemmer to stem German
words. |
class |
GermanNormalizationFilter
Normalizes German characters according to the heuristics
of the
German2 snowball algorithm.
|
class |
GermanStemFilter
A
TokenFilter that stems German words. |
Modifier and Type | Class and Description |
---|---|
class |
GreekLowerCaseFilter
Normalizes token text to lower case, removes some Greek diacritics,
and standardizes final sigma to sigma.
|
class |
GreekStemFilter
A
TokenFilter that applies GreekStemmer to stem Greek
words. |
Modifier and Type | Class and Description |
---|---|
class |
EnglishMinimalStemFilter
A
TokenFilter that applies EnglishMinimalStemmer to stem
English words. |
class |
EnglishPossessiveFilter
TokenFilter that removes possessives (trailing 's) from words.
|
class |
KStemFilter
A high-performance kstem filter for english.
|
Modifier and Type | Class and Description |
---|---|
class |
SpanishLightStemFilter
A
TokenFilter that applies SpanishLightStemmer to stem Spanish
words. |
Modifier and Type | Class and Description |
---|---|
class |
PersianNormalizationFilter
A
TokenFilter that applies PersianNormalizer to normalize the
orthography. |
Modifier and Type | Class and Description |
---|---|
class |
FinnishLightStemFilter
A
TokenFilter that applies FinnishLightStemmer to stem Finnish
words. |
Modifier and Type | Class and Description |
---|---|
class |
ElisionFilter
Removes elisions from a
TokenStream . |
class |
FrenchLightStemFilter
A
TokenFilter that applies FrenchLightStemmer to stem French
words. |
class |
FrenchMinimalStemFilter
A
TokenFilter that applies FrenchMinimalStemmer to stem French
words. |
class |
FrenchStemFilter
Deprecated.
Use
SnowballFilter with
FrenchStemmer instead, which has the
same functionality. This filter will be removed in Lucene 5.0 |
Modifier and Type | Class and Description |
---|---|
class |
IrishLowerCaseFilter
Normalises token text to lower case, handling t-prothesis
and n-eclipsis (i.e., that 'nAthair' should become 'n-athair')
|
Modifier and Type | Class and Description |
---|---|
class |
GalicianMinimalStemFilter
A
TokenFilter that applies GalicianMinimalStemmer to stem
Galician words. |
class |
GalicianStemFilter
A
TokenFilter that applies GalicianStemmer to stem
Galician words. |
Modifier and Type | Class and Description |
---|---|
class |
HindiNormalizationFilter
A
TokenFilter that applies HindiNormalizer to normalize the
orthography. |
class |
HindiStemFilter
A
TokenFilter that applies HindiStemmer to stem Hindi words. |
Modifier and Type | Class and Description |
---|---|
class |
HungarianLightStemFilter
A
TokenFilter that applies HungarianLightStemmer to stem
Hungarian words. |
Modifier and Type | Class and Description |
---|---|
class |
HunspellStemFilter
TokenFilter that uses hunspell affix rules and words to stem tokens.
|
Modifier and Type | Class and Description |
---|---|
class |
ICUFoldingFilter
A TokenFilter that applies search term folding to Unicode text,
applying foldings from UTR#30 Character Foldings.
|
class |
ICUNormalizer2Filter
Normalize token text with ICU's
Normalizer2
With this filter, you can normalize text in the following ways:
NFKC Normalization, Case Folding, and removing Ignorables (the default)
Using a standard Normalization mode (NFC, NFD, NFKC, NFKD)
Based on rules from a custom normalization mapping. |
class |
ICUTransformFilter
A
TokenFilter that transforms text with ICU. |
Modifier and Type | Class and Description |
---|---|
class |
IndonesianStemFilter
A
TokenFilter that applies IndonesianStemmer to stem Indonesian words. |
Modifier and Type | Class and Description |
---|---|
class |
IndicNormalizationFilter
A
TokenFilter that applies IndicNormalizer to normalize text
in Indian Languages. |
Modifier and Type | Class and Description |
---|---|
class |
ItalianLightStemFilter
A
TokenFilter that applies ItalianLightStemmer to stem Italian
words. |
Modifier and Type | Class and Description |
---|---|
class |
JapaneseBaseFormFilter
Replaces term text with the
BaseFormAttribute . |
class |
JapaneseKatakanaStemFilter
A
TokenFilter that normalizes common katakana spelling variations
ending in a long sound character by removing this character (U+30FC). |
class |
JapanesePartOfSpeechStopFilter
Removes tokens that match a set of part-of-speech tags.
|
class |
JapaneseReadingFormFilter
A
TokenFilter that replaces the term
attribute with the reading of a token in either katakana or romaji form. |
Modifier and Type | Class and Description |
---|---|
class |
LatvianStemFilter
A
TokenFilter that applies LatvianStemmer to stem Latvian
words. |
Modifier and Type | Class and Description |
---|---|
class |
StemmerOverrideFilter
Provides the ability to override any
KeywordAttribute aware stemmer
with custom dictionary-based stemming. |
Modifier and Type | Class and Description |
---|---|
class |
EdgeNGramTokenFilter
Tokenizes the given token into n-grams of given size(s).
|
class |
NGramTokenFilter
Tokenizes the input into n-grams of the given size(s).
|
Modifier and Type | Class and Description |
---|---|
class |
DutchStemFilter
Deprecated.
Use
SnowballFilter with
DutchStemmer instead, which has the
same functionality. This filter will be removed in Lucene 5.0 |
Modifier and Type | Class and Description |
---|---|
class |
NorwegianLightStemFilter
A
TokenFilter that applies NorwegianLightStemmer to stem Norwegian
words. |
class |
NorwegianMinimalStemFilter
A
TokenFilter that applies NorwegianMinimalStemmer to stem Norwegian
words. |
Modifier and Type | Class and Description |
---|---|
class |
DelimitedPayloadTokenFilter
Characters before the delimiter are the "token", those after are the payload.
|
class |
NumericPayloadTokenFilter
Assigns a payload to a token based on the
Token.type() |
class |
TokenOffsetPayloadTokenFilter
Adds the
Token.setStartOffset(int)
and Token.setEndOffset(int)
First 4 bytes are the start |
class |
TypeAsPayloadTokenFilter
Makes the
Token.type() a payload. |
Modifier and Type | Class and Description |
---|---|
class |
BeiderMorseFilter
TokenFilter for Beider-Morse phonetic encoding.
|
class |
DoubleMetaphoneFilter
Filter for DoubleMetaphone (supporting secondary codes)
|
class |
PhoneticFilter
Create tokens for phonetic matches.
|
Modifier and Type | Class and Description |
---|---|
class |
PositionFilter
Set the positionIncrement of all tokens to the "positionIncrement",
except the first return token which retains its original positionIncrement value.
|
Modifier and Type | Class and Description |
---|---|
class |
PortugueseLightStemFilter
A
TokenFilter that applies PortugueseLightStemmer to stem
Portuguese words. |
class |
PortugueseMinimalStemFilter
A
TokenFilter that applies PortugueseMinimalStemmer to stem
Portuguese words. |
class |
PortugueseStemFilter
A
TokenFilter that applies PortugueseStemmer to stem
Portuguese words. |
Modifier and Type | Class and Description |
---|---|
class |
ReverseStringFilter
Reverse token string, for example "country" => "yrtnuoc".
|
Modifier and Type | Class and Description |
---|---|
class |
RussianLightStemFilter
A
TokenFilter that applies RussianLightStemmer to stem Russian
words. |
class |
RussianLowerCaseFilter
Deprecated.
Use
LowerCaseFilter instead, which has the same
functionality. This filter will be removed in Lucene 4.0 |
class |
RussianStemFilter
Deprecated.
Use
SnowballFilter with
RussianStemmer instead, which has the
same functionality. This filter will be removed in Lucene 4.0 |
Modifier and Type | Class and Description |
---|---|
class |
ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream.
|
Modifier and Type | Class and Description |
---|---|
class |
SnowballFilter
A filter that stems words using a Snowball-generated stemmer.
|
Modifier and Type | Class and Description |
---|---|
class |
ClassicFilter
Normalizes tokens extracted with
ClassicTokenizer . |
class |
StandardFilter
Normalizes tokens extracted with
StandardTokenizer . |
Modifier and Type | Class and Description |
---|---|
class |
StempelFilter
Transforms the token stream as per the stemming algorithm.
|
Modifier and Type | Class and Description |
---|---|
class |
SwedishLightStemFilter
A
TokenFilter that applies SwedishLightStemmer to stem Swedish
words. |
Modifier and Type | Class and Description |
---|---|
class |
SynonymFilter
Matches single or multi word synonyms in a token stream.
|
Modifier and Type | Class and Description |
---|---|
class |
ThaiWordFilter
TokenFilter that use BreakIterator to break each
Token that is Thai into separate Token(s) for each Thai word. |
Modifier and Type | Class and Description |
---|---|
class |
TurkishLowerCaseFilter
Normalizes Turkish token text to lower case.
|
Modifier and Type | Class and Description |
---|---|
class |
CollationKeyFilter
Converts each token into its
CollationKey , and then
encodes the CollationKey with IndexableBinaryStringTools , to allow
it to be stored as an index term. |
class |
ICUCollationKeyFilter
Converts each token into its
CollationKey , and
then encodes the CollationKey with IndexableBinaryStringTools , to
allow it to be stored as an index term. |
Modifier and Type | Class and Description |
---|---|
class |
EnhancementsCategoryTokenizer
A tokenizer which adds to each category token payload according to the
CategoryEnhancement s defined in the given
EnhancementsIndexingParams . |
Modifier and Type | Class and Description |
---|---|
class |
AssociationListTokenizer
Tokenizer for associations of a category
|
Modifier and Type | Class and Description |
---|---|
class |
CategoryListTokenizer
A base class for category list tokenizers, which add category list tokens to
category streams.
|
class |
CategoryParentsStream
This class adds parents to a
CategoryAttributesStream . |
class |
CategoryTokenizer
Basic class for setting the
CharTermAttribute s and
PayloadAttribute s of category tokens. |
class |
CategoryTokenizerBase
A base class for all token filters which add term and payload attributes to
tokens and are to be used in
CategoryDocumentBuilder . |
class |
CountingListTokenizer
CategoryListTokenizer for facet counting |
Modifier and Type | Class and Description |
---|---|
static class |
QueryParserTestBase.QPTestFilter
Filter which discards the token 'stop' and which expands the
token 'phrase' into 'phrase1 phrase2'
|
Modifier and Type | Class and Description |
---|---|
class |
OffsetLimitTokenFilter
This TokenFilter limits the number of tokens while indexing by adding up the
current offset.
|