Package org.apache.lucene.analysis.email
package org.apache.lucene.analysis.email
Fast, general-purpose URLs and email addresses tokenizers.
UAX29URLEmailTokenizer
: implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29, except URLs and email addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailAnalyzer
includesUAX29URLEmailTokenizer
,LowerCaseFilter
andStopFilter
.
-
ClassDescriptionFilters
UAX29URLEmailTokenizer
withLowerCaseFilter
andStopFilter
, using a list of English stop words.This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.Factory forUAX29URLEmailTokenizer
.This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.