|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use StandardTokenizerInterface | |
---|---|
org.apache.lucene.analysis.standard | The org.apache.lucene.analysis.standard package contains three
fast grammar-based tokenizers constructed with JFlex: |
org.apache.lucene.analysis.standard.std31 |
Uses of StandardTokenizerInterface in org.apache.lucene.analysis.standard |
---|
Classes in org.apache.lucene.analysis.standard that implement StandardTokenizerInterface | |
---|---|
class |
StandardTokenizerImpl
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 Tokens produced are of the following types: <ALPHANUM>: A sequence of alphabetic and numeric characters <NUM>: A number <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast Asian languages, including Thai, Lao, Myanmar, and Khmer <IDEOGRAPHIC>: A single CJKV ideographic character <HIRAGANA>: A single hiragana character |
class |
UAX29URLEmailTokenizerImpl
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs. |
Uses of StandardTokenizerInterface in org.apache.lucene.analysis.standard.std31 |
---|
Classes in org.apache.lucene.analysis.standard.std31 that implement StandardTokenizerInterface | |
---|---|
class |
StandardTokenizerImpl31
Deprecated. This class is only for exact backwards compatibility |
class |
UAX29URLEmailTokenizerImpl31
Deprecated. This class is only for exact backwards compatibility |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |