|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use Tokenizer | |
---|---|
org.apache.lucene.analysis | API and code to convert text into indexable/searchable tokens. |
org.apache.lucene.analysis.ar | Analyzer for Arabic. |
org.apache.lucene.analysis.cjk | Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters). |
org.apache.lucene.analysis.cn | Analyzer for Chinese, which indexes unigrams (individual chinese characters). |
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words. |
org.apache.lucene.analysis.ngram | Character n-gram tokenizers and filters. |
org.apache.lucene.analysis.ru | Analyzer for Russian. |
org.apache.lucene.analysis.sinks | Implementations of the SinkTokenizer that might be useful. |
org.apache.lucene.analysis.standard | A fast grammar-based tokenizer constructed with JFlex. |
org.apache.lucene.wikipedia.analysis | Tokenizer that is aware of Wikipedia syntax. |
Uses of Tokenizer in org.apache.lucene.analysis |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis | |
---|---|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers. |
class |
KeywordTokenizer
Emits the entire input as a single token. |
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. |
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. |
class |
SinkTokenizer
Deprecated. Use TeeSinkTokenFilter instead |
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace. |
Uses of Tokenizer in org.apache.lucene.analysis.ar |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.ar | |
---|---|
class |
ArabicLetterTokenizer
Tokenizer that breaks text into runs of letters and diacritics. |
Uses of Tokenizer in org.apache.lucene.analysis.cjk |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.cjk | |
---|---|
class |
CJKTokenizer
CJKTokenizer is designed for Chinese, Japanese, and Korean languages. |
Uses of Tokenizer in org.apache.lucene.analysis.cn |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.cn | |
---|---|
class |
ChineseTokenizer
Tokenize Chinese text as individual chinese characters. |
Uses of Tokenizer in org.apache.lucene.analysis.cn.smart |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.cn.smart | |
---|---|
class |
SentenceTokenizer
Tokenizes input text into sentences. |
Uses of Tokenizer in org.apache.lucene.analysis.ngram |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.ngram | |
---|---|
class |
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s). |
class |
NGramTokenizer
Tokenizes the input into n-grams of the given size(s). |
Uses of Tokenizer in org.apache.lucene.analysis.ru |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.ru | |
---|---|
class |
RussianLetterTokenizer
A RussianLetterTokenizer is a Tokenizer that extends LetterTokenizer
by additionally looking up letters in a given "russian charset". |
Uses of Tokenizer in org.apache.lucene.analysis.sinks |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.sinks | |
---|---|
class |
DateRecognizerSinkTokenizer
Deprecated. Use DateRecognizerSinkFilter and TeeSinkTokenFilter instead. |
class |
TokenRangeSinkTokenizer
Deprecated. Use TokenRangeSinkFilter and TeeSinkTokenFilter instead. |
class |
TokenTypeSinkTokenizer
Deprecated. Use TokenTypeSinkFilter and TeeSinkTokenFilter instead. |
Uses of Tokenizer in org.apache.lucene.analysis.standard |
---|
Subclasses of Tokenizer in org.apache.lucene.analysis.standard | |
---|---|
class |
StandardTokenizer
A grammar-based tokenizer constructed with JFlex |
Uses of Tokenizer in org.apache.lucene.wikipedia.analysis |
---|
Subclasses of Tokenizer in org.apache.lucene.wikipedia.analysis | |
---|---|
class |
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax. |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |