StandardTokenizer
implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29.See: Description
Class | Description |
---|---|
StandardAnalyzer |
Filters
StandardTokenizer with LowerCaseFilter and
StopFilter , using a configurable list of stop words. |
StandardFilter | Deprecated
StandardFilter is a no-op and can be removed from code
|
StandardTokenizer |
A grammar-based tokenizer constructed with JFlex.
|
StandardTokenizerImpl |
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29.
|
StandardTokenizer
implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29.
Unlike UAX29URLEmailTokenizer
from the analysis module, URLs and email addresses are
not tokenized as single tokens, but are instead split up into
tokens according to the UAX#29 word break rules.
StandardAnalyzer
includes
StandardTokenizer
,
LowerCaseFilter
and StopFilter
.Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.