Package org.apache.lucene.analysis.standard

The org.apache.lucene.analysis.standard package contains three fast grammar-based tokenizers constructed with JFlex:

See:
          Description

Class Summary
ClassicAnalyzer Filters ClassicTokenizer with ClassicFilter, LowerCaseFilter and StopFilter, using a list of English stop words.
ClassicFilter Normalizes tokens extracted with ClassicTokenizer.
ClassicTokenizer A grammar-based tokenizer constructed with JFlex
StandardAnalyzer Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.
StandardFilter Normalizes tokens extracted with StandardTokenizer.
StandardTokenizer A grammar-based tokenizer constructed with JFlex.
StandardTokenizerImpl This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29

Tokens produced are of the following types: <ALPHANUM>: A sequence of alphabetic and numeric characters <NUM>: A number <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast Asian languages, including Thai, Lao, Myanmar, and Khmer <IDEOGRAPHIC>: A single CJKV ideographic character <HIRAGANA>: A single hiragana character

UAX29URLEmailTokenizer This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
 

Package org.apache.lucene.analysis.standard Description

The org.apache.lucene.analysis.standard package contains three fast grammar-based tokenizers constructed with JFlex:



Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.