org.apache.lucene.analysis.cn.smart (Lucene 2.9.4 API)

Analyzer for Simplified Chinese, which indexes words.

Class Summary
AnalyzerProfile	Manages analysis data configuration for SmartChineseAnalyzer
CharType	Internal SmartChineseAnalyzer character type constants.
SentenceTokenizer	Tokenizes input text into sentences.
SmartChineseAnalyzer	SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
Utility	SmartChineseAnalyzer utility constants and methods
WordTokenFilter	A `TokenFilter` that breaks sentences into words.
WordType	Internal SmartChineseAnalyzer token type constants

Package org.apache.lucene.analysis.cn.smart Description

Analyzer for Simplified Chinese, which indexes words.

WARNING: The status of the analyzers/smartcn analysis.cn.smart package is experimental. The APIs and file formats introduced here might change in the future and will not be supported anymore in such a case.

Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.

ChineseAnalyzer (in the analyzers/cn package): Index unigrams (individual Chinese characters) as a token.
CJKAnalyzer (in the analyzers/cjk package): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.
SmartChineseAnalyzer (in this package): Index words (attempt to segment Chinese text into words) as tokens.

Example phrase： "我是中国人"

ChineseAnalyzer: 我－是－中－国－人
CJKAnalyzer: 我是－是中－中国－国人
SmartChineseAnalyzer: 我－是－中国－人

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES