org.apache.lucene.analysis.standard (Lucene 8.1.0 API)

Fast, general-purpose grammar-based tokenizers.

ClassicTokenizer: this class was formerly (prior to Lucene 3.1) named StandardTokenizer. (Its tokenization rules are not based on the Unicode Text Segmentation algorithm.) ClassicAnalyzer includes ClassicTokenizer, LowerCaseFilter and StopFilter.
UAX29URLEmailTokenizer: implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29, except URLs and email addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailAnalyzer includes UAX29URLEmailTokenizer, LowerCaseFilter and StopFilter.

This Java package additionally contains StandardAnalyzer and StandardTokenizer, which are not visible here, because they moved to Lucene Core. The factories for those components (e.g., used in Solr) are still part of this module.

Class Summary
Class	Description
ClassicAnalyzer	Filters `ClassicTokenizer` with `ClassicFilter`, `LowerCaseFilter` and `StopFilter`, using a list of English stop words.
ClassicFilter	Normalizes tokens extracted with `ClassicTokenizer`.
ClassicFilterFactory	Factory for `ClassicFilter`.
ClassicTokenizer	A grammar-based tokenizer constructed with JFlex
ClassicTokenizerFactory	Factory for `ClassicTokenizer`.
StandardTokenizerFactory	Factory for `StandardTokenizer`.
UAX29URLEmailAnalyzer	Filters `UAX29URLEmailTokenizer` with `LowerCaseFilter` and `StopFilter`, using a list of English stop words.
UAX29URLEmailTokenizer	This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailTokenizerFactory	Factory for `UAX29URLEmailTokenizer`.
UAX29URLEmailTokenizerImpl	This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.

Package org.apache.lucene.analysis.standard