Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.bg |
Analyzer for Bulgarian.
|
org.apache.lucene.analysis.br |
Analyzer for Brazilian Portuguese.
|
org.apache.lucene.analysis.ca |
Analyzer for Catalan.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
|
org.apache.lucene.analysis.cn |
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
|
org.apache.lucene.analysis.cz |
Analyzer for Czech.
|
org.apache.lucene.analysis.da |
Analyzer for Danish.
|
org.apache.lucene.analysis.de |
Analyzer for German.
|
org.apache.lucene.analysis.el |
Analyzer for Greek.
|
org.apache.lucene.analysis.en |
Analyzer for English.
|
org.apache.lucene.analysis.es |
Analyzer for Spanish.
|
org.apache.lucene.analysis.eu |
Analyzer for Basque.
|
org.apache.lucene.analysis.fa |
Analyzer for Persian.
|
org.apache.lucene.analysis.fi |
Analyzer for Finnish.
|
org.apache.lucene.analysis.fr |
Analyzer for French.
|
org.apache.lucene.analysis.ga |
Analysis for Irish.
|
org.apache.lucene.analysis.gl |
Analyzer for Galician.
|
org.apache.lucene.analysis.hi |
Analyzer for Hindi.
|
org.apache.lucene.analysis.hu |
Analyzer for Hungarian.
|
org.apache.lucene.analysis.hy |
Analyzer for Armenian.
|
org.apache.lucene.analysis.id |
Analyzer for Indonesian.
|
org.apache.lucene.analysis.it |
Analyzer for Italian.
|
org.apache.lucene.analysis.ja |
Analyzer for Japanese.
|
org.apache.lucene.analysis.lv |
Analyzer for Latvian.
|
org.apache.lucene.analysis.miscellaneous |
Miscellaneous TokenStreams
|
org.apache.lucene.analysis.nl |
Analyzer for Dutch.
|
org.apache.lucene.analysis.no |
Analyzer for Norwegian.
|
org.apache.lucene.analysis.pl |
Analyzer for Polish.
|
org.apache.lucene.analysis.pt |
Analyzer for Portuguese.
|
org.apache.lucene.analysis.ro |
Analyzer for Romanian.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.standard |
Standards-based analyzers implemented with JFlex.
|
org.apache.lucene.analysis.sv |
Analyzer for Swedish.
|
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.analysis.tr |
Analyzer for Turkish.
|
Modifier and Type | Class and Description |
---|---|
class |
KeywordAnalyzer
"Tokenizes" the entire stream as a single token.
|
class |
SimpleAnalyzer
An
Analyzer that filters LetterTokenizer
with LowerCaseFilter
You must specify the required Version compatibility
when creating CharTokenizer :
As of 3.1, LowerCaseTokenizer uses an int based API to normalize and
detect token codepoints. |
class |
StopAnalyzer
|
class |
StopwordAnalyzerBase
Base class for Analyzers that need to make use of stopword sets.
|
class |
WhitespaceAnalyzer
An Analyzer that uses
WhitespaceTokenizer . |
Modifier and Type | Method and Description |
---|---|
protected static CharArraySet |
StopwordAnalyzerBase.loadStopwordSet(boolean ignoreCase,
Class<? extends ReusableAnalyzerBase> aClass,
String resource,
String comment)
Creates a CharArraySet from a file resource associated with a class.
|
Modifier and Type | Class and Description |
---|---|
class |
ArabicAnalyzer
Analyzer for Arabic. |
Modifier and Type | Class and Description |
---|---|
class |
BulgarianAnalyzer
Analyzer for Bulgarian. |
Modifier and Type | Class and Description |
---|---|
class |
BrazilianAnalyzer
Analyzer for Brazilian Portuguese language. |
Modifier and Type | Class and Description |
---|---|
class |
CatalanAnalyzer
Analyzer for Catalan. |
Modifier and Type | Class and Description |
---|---|
class |
CJKAnalyzer
An
Analyzer that tokenizes text with StandardTokenizer ,
normalizes content with CJKWidthFilter , folds case with
LowerCaseFilter , forms bigrams of CJK with CJKBigramFilter ,
and filters stopwords with StopFilter |
Modifier and Type | Class and Description |
---|---|
class |
ChineseAnalyzer
Deprecated.
Use
StandardAnalyzer instead, which has the same functionality.
This analyzer will be removed in Lucene 5.0 |
Modifier and Type | Class and Description |
---|---|
class |
CzechAnalyzer
Analyzer for Czech language. |
Modifier and Type | Class and Description |
---|---|
class |
DanishAnalyzer
Analyzer for Danish. |
Modifier and Type | Class and Description |
---|---|
class |
GermanAnalyzer
Analyzer for German language. |
Modifier and Type | Class and Description |
---|---|
class |
GreekAnalyzer
Analyzer for the Greek language. |
Modifier and Type | Class and Description |
---|---|
class |
EnglishAnalyzer
Analyzer for English. |
Modifier and Type | Class and Description |
---|---|
class |
SpanishAnalyzer
Analyzer for Spanish. |
Modifier and Type | Class and Description |
---|---|
class |
BasqueAnalyzer
Analyzer for Basque. |
Modifier and Type | Class and Description |
---|---|
class |
PersianAnalyzer
Analyzer for Persian. |
Modifier and Type | Class and Description |
---|---|
class |
FinnishAnalyzer
Analyzer for Finnish. |
Modifier and Type | Class and Description |
---|---|
class |
FrenchAnalyzer
Analyzer for French language. |
Modifier and Type | Class and Description |
---|---|
class |
IrishAnalyzer
Analyzer for Irish. |
Modifier and Type | Class and Description |
---|---|
class |
GalicianAnalyzer
Analyzer for Galician. |
Modifier and Type | Class and Description |
---|---|
class |
HindiAnalyzer
Analyzer for Hindi.
|
Modifier and Type | Class and Description |
---|---|
class |
HungarianAnalyzer
Analyzer for Hungarian. |
Modifier and Type | Class and Description |
---|---|
class |
ArmenianAnalyzer
Analyzer for Armenian. |
Modifier and Type | Class and Description |
---|---|
class |
IndonesianAnalyzer
Analyzer for Indonesian (Bahasa)
|
Modifier and Type | Class and Description |
---|---|
class |
ItalianAnalyzer
Analyzer for Italian. |
Modifier and Type | Class and Description |
---|---|
class |
JapaneseAnalyzer
Analyzer for Japanese that uses morphological analysis.
|
Modifier and Type | Class and Description |
---|---|
class |
LatvianAnalyzer
Analyzer for Latvian. |
Modifier and Type | Class and Description |
---|---|
class |
PatternAnalyzer
Efficient Lucene analyzer/tokenizer that preferably operates on a String rather than a
Reader , that can flexibly separate text into terms via a regular expression Pattern
(with behaviour identical to String.split(String) ),
and that combines the functionality of
LetterTokenizer ,
LowerCaseTokenizer ,
WhitespaceTokenizer ,
StopFilter into a single efficient
multi-purpose class. |
Modifier and Type | Class and Description |
---|---|
class |
DutchAnalyzer
Analyzer for Dutch language. |
Modifier and Type | Class and Description |
---|---|
class |
NorwegianAnalyzer
Analyzer for Norwegian. |
Modifier and Type | Class and Description |
---|---|
class |
PolishAnalyzer
Analyzer for Polish. |
Modifier and Type | Class and Description |
---|---|
class |
PortugueseAnalyzer
Analyzer for Portuguese. |
Modifier and Type | Class and Description |
---|---|
class |
RomanianAnalyzer
Analyzer for Romanian. |
Modifier and Type | Class and Description |
---|---|
class |
RussianAnalyzer
Analyzer for Russian language. |
Modifier and Type | Class and Description |
---|---|
class |
ClassicAnalyzer
Filters
ClassicTokenizer with ClassicFilter , LowerCaseFilter and StopFilter , using a list of
English stop words. |
class |
StandardAnalyzer
Filters
StandardTokenizer with StandardFilter , LowerCaseFilter and StopFilter , using a list of
English stop words. |
class |
UAX29URLEmailAnalyzer
Filters
UAX29URLEmailTokenizer
with StandardFilter ,
LowerCaseFilter and
StopFilter , using a list of
English stop words. |
Modifier and Type | Class and Description |
---|---|
class |
SwedishAnalyzer
Analyzer for Swedish. |
Modifier and Type | Class and Description |
---|---|
class |
ThaiAnalyzer
Analyzer for Thai language. |
Modifier and Type | Class and Description |
---|---|
class |
TurkishAnalyzer
Analyzer for Turkish. |