Uses of Class
org.apache.lucene.analysis.Tokenizer

Packages that use Tokenizer
org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens. 
org.apache.lucene.analysis.ar Analyzer for Arabic. 
org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters). 
org.apache.lucene.analysis.cn Analyzer for Chinese, which indexes unigrams (individual chinese characters). 
org.apache.lucene.analysis.cn.smart
Analyzer for Simplified Chinese, which indexes words. 
org.apache.lucene.analysis.ngram Character n-gram tokenizers and filters. 
org.apache.lucene.analysis.ru Analyzer for Russian. 
org.apache.lucene.analysis.sinks
Implementations of the SinkTokenizer that might be useful. 
org.apache.lucene.analysis.standard A fast grammar-based tokenizer constructed with JFlex. 
org.apache.lucene.wikipedia.analysis Tokenizer that is aware of Wikipedia syntax. 
 

Uses of Tokenizer in org.apache.lucene.analysis
 

Subclasses of Tokenizer in org.apache.lucene.analysis
 class CharTokenizer
          An abstract base class for simple, character-oriented tokenizers.
 class KeywordTokenizer
          Emits the entire input as a single token.
 class LetterTokenizer
          A LetterTokenizer is a tokenizer that divides text at non-letters.
 class LowerCaseTokenizer
          LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.
 class SinkTokenizer
          Deprecated. Use TeeSinkTokenFilter instead
 class WhitespaceTokenizer
          A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
 

Uses of Tokenizer in org.apache.lucene.analysis.ar
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ar
 class ArabicLetterTokenizer
          Tokenizer that breaks text into runs of letters and diacritics.
 

Uses of Tokenizer in org.apache.lucene.analysis.cjk
 

Subclasses of Tokenizer in org.apache.lucene.analysis.cjk
 class CJKTokenizer
          CJKTokenizer is designed for Chinese, Japanese, and Korean languages.
 

Uses of Tokenizer in org.apache.lucene.analysis.cn
 

Subclasses of Tokenizer in org.apache.lucene.analysis.cn
 class ChineseTokenizer
          Tokenize Chinese text as individual chinese characters.
 

Uses of Tokenizer in org.apache.lucene.analysis.cn.smart
 

Subclasses of Tokenizer in org.apache.lucene.analysis.cn.smart
 class SentenceTokenizer
          Tokenizes input text into sentences.
 

Uses of Tokenizer in org.apache.lucene.analysis.ngram
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ngram
 class EdgeNGramTokenizer
          Tokenizes the input from an edge into n-grams of given size(s).
 class NGramTokenizer
          Tokenizes the input into n-grams of the given size(s).
 

Uses of Tokenizer in org.apache.lucene.analysis.ru
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ru
 class RussianLetterTokenizer
          A RussianLetterTokenizer is a Tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset".
 

Uses of Tokenizer in org.apache.lucene.analysis.sinks
 

Subclasses of Tokenizer in org.apache.lucene.analysis.sinks
 class DateRecognizerSinkTokenizer
          Deprecated. Use DateRecognizerSinkFilter and TeeSinkTokenFilter instead.
 class TokenRangeSinkTokenizer
          Deprecated. Use TokenRangeSinkFilter and TeeSinkTokenFilter instead.
 class TokenTypeSinkTokenizer
          Deprecated. Use TokenTypeSinkFilter and TeeSinkTokenFilter instead.
 

Uses of Tokenizer in org.apache.lucene.analysis.standard
 

Subclasses of Tokenizer in org.apache.lucene.analysis.standard
 class StandardTokenizer
          A grammar-based tokenizer constructed with JFlex
 

Uses of Tokenizer in org.apache.lucene.wikipedia.analysis
 

Subclasses of Tokenizer in org.apache.lucene.wikipedia.analysis
 class WikipediaTokenizer
          Extension of StandardTokenizer that is aware of Wikipedia syntax.
 



Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.