Skip navigation links

Package org.apache.lucene.analysis.standard

Fast, general-purpose grammar-based tokenizer StandardTokenizer implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

See: Description

Package org.apache.lucene.analysis.standard Description

Fast, general-purpose grammar-based tokenizer StandardTokenizer implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. Unlike UAX29URLEmailTokenizer from the analysis module, URLs and email addresses are not tokenized as single tokens, but are instead split up into tokens according to the UAX#29 word break rules.
StandardAnalyzer includes StandardTokenizer, LowerCaseFilter and StopFilter.
Skip navigation links

Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.