Analyzer for Simplified Chinese, which indexes words.
org.apache.lucene.analysis.cn.smart.hhmm - package org.apache.lucene.analysis.cn.smart.hhmm
SmartChineseAnalyzer Hidden Markov Model package.
OTHER -
Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Other (not fitting any of the other categories)
P
- process(String) -
Method in class org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter
- Return a list of
SegToken
representing the best segmentation of a sentence
R
- reset() -
Method in class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
-
- reset(Reader) -
Method in class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
-
- reset() -
Method in class org.apache.lucene.analysis.cn.smart.WordTokenFilter
-
- reusableTokenStream(String, Reader) -
Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
-
S
- SegToken - Class in org.apache.lucene.analysis.cn.smart.hhmm
- SmartChineseAnalyzer internal token
- SegToken(char[], int, int, int, int) -
Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
- Create a new SegToken from a character array.
- SegTokenFilter - Class in org.apache.lucene.analysis.cn.smart.hhmm
-
Filters a
SegToken
by converting full-width latin to half-width, then lowercasing latin. - SegTokenFilter() -
Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter
-
- SENTENCE_BEGIN -
Static variable in class org.apache.lucene.analysis.cn.smart.WordType
- Start of a Sentence
- SENTENCE_END -
Static variable in class org.apache.lucene.analysis.cn.smart.WordType
- End of a Sentence
- SentenceTokenizer - Class in org.apache.lucene.analysis.cn.smart
- Tokenizes input text into sentences.
- SentenceTokenizer(Reader) -
Constructor for class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
-
- SentenceTokenizer(AttributeSource, Reader) -
Constructor for class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
-
- SentenceTokenizer(AttributeSource.AttributeFactory, Reader) -
Constructor for class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
-
- SmartChineseAnalyzer - Class in org.apache.lucene.analysis.cn.smart
-
SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
- SmartChineseAnalyzer() -
Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
- Deprecated. Use
SmartChineseAnalyzer.SmartChineseAnalyzer(Version)
instead
- SmartChineseAnalyzer(Version) -
Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
- Create a new SmartChineseAnalyzer, using the default stopword list.
- SmartChineseAnalyzer(boolean) -
Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
- Deprecated. Use
SmartChineseAnalyzer.SmartChineseAnalyzer(Version, boolean)
instead
- SmartChineseAnalyzer(Version, boolean) -
Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
-
Create a new SmartChineseAnalyzer, optionally using the default stopword list.
- SmartChineseAnalyzer(Set) -
Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
- Deprecated. Use
SmartChineseAnalyzer.SmartChineseAnalyzer(Version, Set)
instead
- SmartChineseAnalyzer(Version, Set) -
Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
-
Create a new SmartChineseAnalyzer, using the provided
Set
of stopwords.
- SPACE_LIKE -
Static variable in class org.apache.lucene.analysis.cn.smart.CharType
- Characters that act as a space
- SPACES -
Static variable in class org.apache.lucene.analysis.cn.smart.Utility
- Space-like characters that need to be skipped: such as space, tab, newline, carriage return.
- START_CHAR_ARRAY -
Static variable in class org.apache.lucene.analysis.cn.smart.Utility
-
- startOffset -
Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
- start offset into
charArray
- STRING -
Static variable in class org.apache.lucene.analysis.cn.smart.WordType
- ASCII String
- STRING_CHAR_ARRAY -
Static variable in class org.apache.lucene.analysis.cn.smart.Utility
-
T
- tokenStream(String, Reader) -
Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
-
U
- Utility - Class in org.apache.lucene.analysis.cn.smart
- SmartChineseAnalyzer utility constants and methods
- Utility() -
Constructor for class org.apache.lucene.analysis.cn.smart.Utility
-
W
- weight -
Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
- word frequency
- WordTokenFilter - Class in org.apache.lucene.analysis.cn.smart
- A
TokenFilter
that breaks sentences into words. - WordTokenFilter(TokenStream) -
Constructor for class org.apache.lucene.analysis.cn.smart.WordTokenFilter
- Construct a new WordTokenizer.
- wordType -
Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
WordType
of the text
- WordType - Class in org.apache.lucene.analysis.cn.smart
- Internal SmartChineseAnalyzer token type constants
- WordType() -
Constructor for class org.apache.lucene.analysis.cn.smart.WordType
-
A C D E F G H I L M N O P R S T U W
Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.