Uses of Class
org.apache.lucene.analysis.Tokenizer

Packages that use Tokenizer
org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens. 
org.apache.lucene.analysis.ar Analyzer for Arabic. 
org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters). 
org.apache.lucene.analysis.cn Analyzer for Chinese, which indexes unigrams (individual chinese characters). 
org.apache.lucene.analysis.cn.smart
Analyzer for Simplified Chinese, which indexes words. 
org.apache.lucene.analysis.icu.segmentation Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm. 
org.apache.lucene.analysis.in Analysis components for Indian languages. 
org.apache.lucene.analysis.ngram Character n-gram tokenizers and filters. 
org.apache.lucene.analysis.path   
org.apache.lucene.analysis.ru Analyzer for Russian. 
org.apache.lucene.analysis.standard The org.apache.lucene.analysis.standard package contains three fast grammar-based tokenizers constructed with JFlex: 
org.apache.lucene.analysis.wikipedia Tokenizer that is aware of Wikipedia syntax. 
 

Uses of Tokenizer in org.apache.lucene.analysis
 

Subclasses of Tokenizer in org.apache.lucene.analysis
 class CharTokenizer
          An abstract base class for simple, character-oriented tokenizers.
 class KeywordTokenizer
          Emits the entire input as a single token.
 class LetterTokenizer
          A LetterTokenizer is a tokenizer that divides text at non-letters.
 class LowerCaseTokenizer
          LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.
 class WhitespaceTokenizer
          A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
 

Fields in org.apache.lucene.analysis declared as Tokenizer
protected  Tokenizer ReusableAnalyzerBase.TokenStreamComponents.source
           
 

Constructors in org.apache.lucene.analysis with parameters of type Tokenizer
ReusableAnalyzerBase.TokenStreamComponents(Tokenizer source)
          Creates a new ReusableAnalyzerBase.TokenStreamComponents instance.
ReusableAnalyzerBase.TokenStreamComponents(Tokenizer source, TokenStream result)
          Creates a new ReusableAnalyzerBase.TokenStreamComponents instance.
 

Uses of Tokenizer in org.apache.lucene.analysis.ar
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ar
 class ArabicLetterTokenizer
          Deprecated. (3.1) Use StandardTokenizer instead.
 

Uses of Tokenizer in org.apache.lucene.analysis.cjk
 

Subclasses of Tokenizer in org.apache.lucene.analysis.cjk
 class CJKTokenizer
          CJKTokenizer is designed for Chinese, Japanese, and Korean languages.
 

Uses of Tokenizer in org.apache.lucene.analysis.cn
 

Subclasses of Tokenizer in org.apache.lucene.analysis.cn
 class ChineseTokenizer
          Deprecated. Use StandardTokenizer instead, which has the same functionality. This filter will be removed in Lucene 5.0
 

Uses of Tokenizer in org.apache.lucene.analysis.cn.smart
 

Subclasses of Tokenizer in org.apache.lucene.analysis.cn.smart
 class SentenceTokenizer
          Tokenizes input text into sentences.
 

Uses of Tokenizer in org.apache.lucene.analysis.icu.segmentation
 

Subclasses of Tokenizer in org.apache.lucene.analysis.icu.segmentation
 class ICUTokenizer
          Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
 

Uses of Tokenizer in org.apache.lucene.analysis.in
 

Subclasses of Tokenizer in org.apache.lucene.analysis.in
 class IndicTokenizer
          Simple Tokenizer for text in Indian Languages.
 

Uses of Tokenizer in org.apache.lucene.analysis.ngram
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ngram
 class EdgeNGramTokenizer
          Tokenizes the input from an edge into n-grams of given size(s).
 class NGramTokenizer
          Tokenizes the input into n-grams of the given size(s).
 

Uses of Tokenizer in org.apache.lucene.analysis.path
 

Subclasses of Tokenizer in org.apache.lucene.analysis.path
 class PathHierarchyTokenizer
          Take something like:
 

Uses of Tokenizer in org.apache.lucene.analysis.ru
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ru
 class RussianLetterTokenizer
          Deprecated. Use StandardTokenizer instead, which has the same functionality. This filter will be removed in Lucene 5.0
 

Uses of Tokenizer in org.apache.lucene.analysis.standard
 

Subclasses of Tokenizer in org.apache.lucene.analysis.standard
 class ClassicTokenizer
          A grammar-based tokenizer constructed with JFlex
 class StandardTokenizer
          A grammar-based tokenizer constructed with JFlex.
 class UAX29URLEmailTokenizer
          This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
 

Uses of Tokenizer in org.apache.lucene.analysis.wikipedia
 

Subclasses of Tokenizer in org.apache.lucene.analysis.wikipedia
 class WikipediaTokenizer
          Extension of StandardTokenizer that is aware of Wikipedia syntax.
 



Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.