Normalizes tokens extracted with
A grammar-based tokenizer constructed with JFlex.
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.
StandardTokenizerimplements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. Unlike
UAX29URLEmailTokenizerfrom the analysis module, URLs and email addresses are not tokenized as single tokens, but are instead split up into tokens according to the UAX#29 word break rules.
Copyright © 2000-2017 Apache Software Foundation. All Rights Reserved.