org.apache.lucene.analysis.standard (Lucene 3.2.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES

Package org.apache.lucene.analysis.standard

The org.apache.lucene.analysis.standard package contains three fast grammar-based tokenizers constructed with JFlex:

See:
Description

Class Summary
ClassicAnalyzer	Filters `ClassicTokenizer` with `ClassicFilter`, `LowerCaseFilter` and `StopFilter`, using a list of English stop words.
ClassicFilter	Normalizes tokens extracted with `ClassicTokenizer`.
ClassicTokenizer	A grammar-based tokenizer constructed with JFlex
StandardAnalyzer	Filters `StandardTokenizer` with `StandardFilter`, `LowerCaseFilter` and `StopFilter`, using a list of English stop words.
StandardFilter	Normalizes tokens extracted with `StandardTokenizer`.
StandardTokenizer	A grammar-based tokenizer constructed with JFlex.
StandardTokenizerImpl	This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 Tokens produced are of the following types: <ALPHANUM>: A sequence of alphabetic and numeric characters <NUM>: A number <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast Asian languages, including Thai, Lao, Myanmar, and Khmer <IDEOGRAPHIC>: A single CJKV ideographic character <HIRAGANA>: A single hiragana character
UAX29URLEmailTokenizer	This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.