See: Description
Class | Description |
---|---|
ClassicAnalyzer |
Filters
ClassicTokenizer with ClassicFilter , LowerCaseFilter and StopFilter , using a list of
English stop words. |
ClassicFilter |
Normalizes tokens extracted with
ClassicTokenizer . |
ClassicFilterFactory |
Factory for
ClassicFilter . |
ClassicTokenizer |
A grammar-based tokenizer constructed with JFlex
|
ClassicTokenizerFactory |
Factory for
ClassicTokenizer . |
StandardTokenizerFactory |
Factory for
StandardTokenizer . |
UAX29URLEmailAnalyzer |
Filters
UAX29URLEmailTokenizer
with LowerCaseFilter and
StopFilter , using a list of
English stop words. |
UAX29URLEmailTokenizer |
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
UAX29URLEmailTokenizerFactory |
Factory for
UAX29URLEmailTokenizer . |
UAX29URLEmailTokenizerImpl |
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
ClassicTokenizer
:
this class was formerly (prior to Lucene 3.1) named
StandardTokenizer
. (Its tokenization rules are not
based on the Unicode Text Segmentation algorithm.)
ClassicAnalyzer
includes
ClassicTokenizer
,
LowerCaseFilter
and StopFilter
.
UAX29URLEmailTokenizer
:
implements the Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29, except
URLs and email addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailAnalyzer
includes
UAX29URLEmailTokenizer
,
LowerCaseFilter
and StopFilter
.
This Java package additionally contains StandardAnalyzer
and StandardTokenizer
,
which are not visible here, because they moved to Lucene Core.
The factories for those components (e.g., used in Solr) are still part of this module.
Copyright © 2000-2021 Apache Software Foundation. All Rights Reserved.