Package org.apache.lucene.analysis.email
Fast, general-purpose URLs and email addresses tokenizers.
UAX29URLEmailTokenizer
: implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29, except URLs and email addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailAnalyzer
includesUAX29URLEmailTokenizer
,LowerCaseFilter
andStopFilter
.
-
Class Summary Class Description UAX29URLEmailAnalyzer FiltersUAX29URLEmailTokenizer
withLowerCaseFilter
andStopFilter
, using a list of English stop words.UAX29URLEmailTokenizer This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.UAX29URLEmailTokenizerFactory Factory forUAX29URLEmailTokenizer
.UAX29URLEmailTokenizerImpl This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.