Package org.apache.lucene.analysis.standard

The org.apache.lucene.analysis.standard package contains three fast grammar-based tokenizers constructed with JFlex:

See:
          Description

Interface Summary
StandardTokenizerInterface  
 

Class Summary
ClassicAnalyzer Filters ClassicTokenizer with ClassicFilter, LowerCaseFilter and StopFilter, using a list of English stop words.
ClassicFilter Normalizes tokens extracted with ClassicTokenizer.
ClassicTokenizer A grammar-based tokenizer constructed with JFlex
StandardAnalyzer Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.
StandardFilter Normalizes tokens extracted with StandardTokenizer.
StandardTokenizer A grammar-based tokenizer constructed with JFlex.
StandardTokenizerImpl This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29

Tokens produced are of the following types: <ALPHANUM>: A sequence of alphabetic and numeric characters <NUM>: A number <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast Asian languages, including Thai, Lao, Myanmar, and Khmer <IDEOGRAPHIC>: A single CJKV ideographic character <HIRAGANA>: A single hiragana character

UAX29URLEmailTokenizer This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailTokenizerImpl This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
 

Package org.apache.lucene.analysis.standard Description

The org.apache.lucene.analysis.standard package contains three fast grammar-based tokenizers constructed with JFlex:



Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.