| Package | Description | 
|---|---|
| org.apache.lucene.analysis | API and code to convert text into indexable/searchable tokens. | 
| org.apache.lucene.analysis.ar | Analyzer for Arabic. | 
| org.apache.lucene.analysis.cjk | Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters). | 
| org.apache.lucene.analysis.cn | Analyzer for Chinese, which indexes unigrams (individual chinese characters). | 
| org.apache.lucene.analysis.cn.smart | 
Analyzer for Simplified Chinese, which indexes words. | 
| org.apache.lucene.analysis.icu.segmentation | Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm. | 
| org.apache.lucene.analysis.in | Analysis components for Indian languages. | 
| org.apache.lucene.analysis.ja | Analyzer for Japanese. | 
| org.apache.lucene.analysis.ngram | Character n-gram tokenizers and filters. | 
| org.apache.lucene.analysis.path | Analysis components for path-like strings such as filenames. | 
| org.apache.lucene.analysis.ru | Analyzer for Russian. | 
| org.apache.lucene.analysis.standard | Standards-based analyzers implemented with JFlex. | 
| org.apache.lucene.analysis.wikipedia | Tokenizer that is aware of Wikipedia syntax. | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | CharTokenizerAn abstract base class for simple, character-oriented tokenizers. | 
| class  | EmptyTokenizerEmits no tokens | 
| class  | KeywordTokenizerEmits the entire input as a single token. | 
| class  | LetterTokenizerA LetterTokenizer is a tokenizer that divides text at non-letters. | 
| class  | LowerCaseTokenizerLowerCaseTokenizer performs the function of LetterTokenizer
 and LowerCaseFilter together. | 
| class  | MockTokenizerTokenizer for testing. | 
| class  | WhitespaceTokenizerA WhitespaceTokenizer is a tokenizer that divides text at whitespace. | 
| Modifier and Type | Field and Description | 
|---|---|
| protected Tokenizer | ReusableAnalyzerBase.TokenStreamComponents. source | 
| Constructor and Description | 
|---|
| ReusableAnalyzerBase.TokenStreamComponents(Tokenizer source)Creates a new  ReusableAnalyzerBase.TokenStreamComponentsinstance. | 
| ReusableAnalyzerBase.TokenStreamComponents(Tokenizer source,
                                          TokenStream result)Creates a new  ReusableAnalyzerBase.TokenStreamComponentsinstance. | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | ArabicLetterTokenizerDeprecated. 
 (3.1) Use  StandardTokenizerinstead. | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | CJKTokenizerDeprecated. 
 Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead. | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | ChineseTokenizerDeprecated. 
 Use  StandardTokenizerinstead, which has the same functionality.
 This filter will be removed in Lucene 5.0 | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | SentenceTokenizerTokenizes input text into sentences. | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | ICUTokenizerBreaks text into words according to UAX #29: Unicode Text Segmentation
 (http://www.unicode.org/reports/tr29/) | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | IndicTokenizerDeprecated. 
 (3.6) Use  StandardTokenizerinstead. | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | JapaneseTokenizerTokenizer for Japanese that uses morphological analysis. | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | EdgeNGramTokenizerTokenizes the input from an edge into n-grams of given size(s). | 
| class  | NGramTokenizerTokenizes the input into n-grams of the given size(s). | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | PathHierarchyTokenizerTokenizer for path-like hierarchies. | 
| class  | ReversePathHierarchyTokenizerTokenizer for domain-like hierarchies. | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | RussianLetterTokenizerDeprecated. 
 Use  StandardTokenizerinstead, which has the same functionality.
 This filter will be removed in Lucene 5.0 | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | ClassicTokenizerA grammar-based tokenizer constructed with JFlex | 
| class  | StandardTokenizerA grammar-based tokenizer constructed with JFlex. | 
| class  | UAX29URLEmailTokenizerThis class implements Word Break rules from the Unicode Text Segmentation 
 algorithm, as specified in 
 Unicode Standard Annex #29 
 URLs and email addresses are also tokenized according to the relevant RFCs. | 
| Modifier and Type | Class and Description | 
|---|---|
| class  | WikipediaTokenizerExtension of StandardTokenizer that is aware of Wikipedia syntax. |