org.apache.lucene.analysis.cn.smart (Lucene 3.1.0 API)

Analyzer for Simplified Chinese, which indexes words.

Class Summary
AnalyzerProfile	Manages analysis data configuration for SmartChineseAnalyzer
CharType	Internal SmartChineseAnalyzer character type constants.
SentenceTokenizer	Tokenizes input text into sentences.
SmartChineseAnalyzer	SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
Utility	SmartChineseAnalyzer utility constants and methods
WordTokenFilter	A `TokenFilter` that breaks sentences into words.
WordType	Internal SmartChineseAnalyzer token type constants

Package org.apache.lucene.analysis.cn.smart Description

Analyzer for Simplified Chinese, which indexes words.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.

StandardAnalyzer: Index unigrams (individual Chinese characters) as a token.
CJKAnalyzer (in the analyzers/cjk package): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.
SmartChineseAnalyzer (in this package): Index words (attempt to segment Chinese text into words) as tokens.

Example phrase： "我是中国人"

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES