Package org.apache.lucene.analysis.standard
Fast, general-purpose grammar-based tokenizers.
ClassicTokenizer
: this class was formerly (prior to Lucene 3.1) namedStandardTokenizer
. (Its tokenization rules are not based on the Unicode Text Segmentation algorithm.)ClassicAnalyzer
includesClassicTokenizer
,LowerCaseFilter
andStopFilter
.UAX29URLEmailTokenizer
: implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29, except URLs and email addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailAnalyzer
includesUAX29URLEmailTokenizer
,LowerCaseFilter
andStopFilter
.
This Java package additionally contains StandardAnalyzer
and StandardTokenizer
,
which are not visible here, because they moved to Lucene Core.
The factories for those components (e.g., used in Solr) are still part of this module.
-
Class Summary Class Description ClassicAnalyzer FiltersClassicTokenizer
withClassicFilter
,LowerCaseFilter
andStopFilter
, using a list of English stop words.ClassicFilter Normalizes tokens extracted withClassicTokenizer
.ClassicFilterFactory Factory forClassicFilter
.ClassicTokenizer A grammar-based tokenizer constructed with JFlexClassicTokenizerFactory Factory forClassicTokenizer
.StandardFilterFactory Deprecated. StandardFilter is a no-op and can be removed from filter chainsStandardTokenizerFactory Factory forStandardTokenizer
.UAX29URLEmailAnalyzer FiltersUAX29URLEmailTokenizer
withLowerCaseFilter
andStopFilter
, using a list of English stop words.UAX29URLEmailTokenizer This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.UAX29URLEmailTokenizerFactory Factory forUAX29URLEmailTokenizer
.UAX29URLEmailTokenizerImpl This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.