org.apache.lucene.analysis.standard (Lucene 7.7.0 API)

Fast, general-purpose grammar-based tokenizer StandardTokenizer implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. Unlike UAX29URLEmailTokenizer from the analysis module, URLs and email addresses are not tokenized as single tokens, but are instead split up into tokens according to the UAX#29 word break rules.
StandardAnalyzer includes StandardTokenizer, LowerCaseFilter and StopFilter.

Class Summary
Class	Description
StandardAnalyzer	Filters `StandardTokenizer` with `LowerCaseFilter` and `StopFilter`, using a configurable list of stop words.
StandardFilter	Deprecated. StandardFilter is a no-op and can be removed from code
StandardTokenizer	A grammar-based tokenizer constructed with JFlex.
StandardTokenizerImpl	This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

Package org.apache.lucene.analysis.standard