Fast, general-purpose grammar-based tokenizers.
|Class and Description|
A grammar-based tokenizer constructed with JFlex
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.