Package | Description |
---|---|
org.apache.lucene.analysis.standard |
Standards-based analyzers implemented with JFlex.
|
org.apache.lucene.analysis.standard.std31 |
Backwards-compatible implementation to match
Version.LUCENE_31 |
org.apache.lucene.analysis.standard.std34 |
Backwards-compatible implementation to match
Version.LUCENE_34 |
Modifier and Type | Class and Description |
---|---|
class |
StandardTokenizerImpl
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
Tokens produced are of the following types:
<ALPHANUM>: A sequence of alphabetic and numeric characters
<NUM>: A number
<SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast
Asian languages, including Thai, Lao, Myanmar, and Khmer
<IDEOGRAPHIC>: A single CJKV ideographic character
<HIRAGANA>: A single hiragana character
|
class |
UAX29URLEmailTokenizerImpl
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
Modifier and Type | Class and Description |
---|---|
class |
StandardTokenizerImpl31
Deprecated.
This class is only for exact backwards compatibility
|
class |
UAX29URLEmailTokenizerImpl31
Deprecated.
This class is only for exact backwards compatibility
|
Modifier and Type | Class and Description |
---|---|
class |
UAX29URLEmailTokenizerImpl34
Deprecated.
This class is only for exact backwards compatibility
|