Index
All Classes and Interfaces|All Packages|Constant Field Values
A
- ANALYSIS_DATA_DIR - Static variable in class org.apache.lucene.analysis.cn.smart.AnalyzerProfile
-
Global indicating the configured analysis data directory
- AnalyzerProfile - Class in org.apache.lucene.analysis.cn.smart
-
Manages analysis data configuration for SmartChineseAnalyzer
- AnalyzerProfile() - Constructor for class org.apache.lucene.analysis.cn.smart.AnalyzerProfile
C
- charArray - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
-
Character array containing token text
- CharType - Class in org.apache.lucene.analysis.cn.smart
-
Internal SmartChineseAnalyzer character type constants.
- CharType() - Constructor for class org.apache.lucene.analysis.cn.smart.CharType
- CHINESE_WORD - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
-
Chinese Word
- COMMON_DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
-
Delimiters will be filtered to this character by
SegTokenFilter
- compareArray(char[], int, char[], int) - Static method in class org.apache.lucene.analysis.cn.smart.Utility
-
compare two arrays starting at the specified offsets.
- compareArrayByPrefix(char[], int, char[], int) - Static method in class org.apache.lucene.analysis.cn.smart.Utility
-
Compare two arrays, starting at the specified offsets, but treating shortArray as a prefix to longArray.
- create(AttributeFactory) - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
- createComponents(String) - Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
D
- DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
-
Punctuation Characters
- DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
-
Punctuation Symbol
- DIGIT - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
-
Numeric Digits
E
- END_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
- endOffset - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
-
end offset into original sentence
- equals(Object) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
F
- filter(SegToken) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter
-
Filter an input
SegToken
- FULLWIDTH_DIGIT - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
-
Full-Width alphanumeric characters
- FULLWIDTH_LETTER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
-
Full-Width letters
- FULLWIDTH_NUMBER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
-
Full-Width Alphanumeric
- FULLWIDTH_STRING - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
-
Full-Width String
G
- getCharType(char) - Static method in class org.apache.lucene.analysis.cn.smart.Utility
-
Return the internal
CharType
constant of a given character. - getDefaultStopSet() - Static method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
-
Returns an unmodifiable instance of the default stop-words set.
H
- HANZI - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
-
Han Ideographs
- hashCode() - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
- HHMMSegmenter - Class in org.apache.lucene.analysis.cn.smart.hhmm
-
Finds the optimal segmentation of a sentence into Chinese words
- HHMMSegmenter() - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter
- HMMChineseTokenizer - Class in org.apache.lucene.analysis.cn.smart
-
Tokenizer for Chinese or mixed Chinese-English text.
- HMMChineseTokenizer() - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
-
Creates a new HMMChineseTokenizer
- HMMChineseTokenizer(AttributeFactory) - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
-
Creates a new HMMChineseTokenizer, supplying the AttributeFactory
- HMMChineseTokenizerFactory - Class in org.apache.lucene.analysis.cn.smart
-
Factory for
HMMChineseTokenizer
- HMMChineseTokenizerFactory() - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
-
Default ctor for compatibility with SPI
- HMMChineseTokenizerFactory(Map<String, String>) - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
-
Creates a new HMMChineseTokenizerFactory
I
- incrementWord() - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
- index - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
-
during segmentation, this is used to store the index of the token in the token list table
L
M
- MAX_FREQUENCE - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
-
Maximum bigram frequency (used in the smoothing function).
N
- NAME - Static variable in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
-
SPI name
- normalize(String, TokenStream) - Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
- NUMBER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
-
ASCII Alphanumeric
- NUMBER_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
O
- org.apache.lucene.analysis.cn.smart - package org.apache.lucene.analysis.cn.smart
-
Analyzer for Simplified Chinese, which indexes words.
- org.apache.lucene.analysis.cn.smart.hhmm - package org.apache.lucene.analysis.cn.smart.hhmm
-
SmartChineseAnalyzer Hidden Markov Model package.
- OTHER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
-
Other (not fitting any of the other categories)
P
- process(String) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter
-
Return a list of
SegToken
representing the best segmentation of a sentence
R
- reset() - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
S
- SegToken - Class in org.apache.lucene.analysis.cn.smart.hhmm
-
SmartChineseAnalyzer internal token
- SegToken(char[], int, int, int, int) - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
-
Create a new SegToken from a character array.
- SegTokenFilter - Class in org.apache.lucene.analysis.cn.smart.hhmm
-
Filters a
SegToken
by converting full-width latin to half-width, then lowercasing latin. - SegTokenFilter() - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter
- SENTENCE_BEGIN - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
-
Start of a Sentence
- SENTENCE_END - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
-
End of a Sentence
- setNextSentence(int, int) - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
- SmartChineseAnalyzer - Class in org.apache.lucene.analysis.cn.smart
-
SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
- SmartChineseAnalyzer() - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
-
Create a new SmartChineseAnalyzer, using the default stopword list.
- SmartChineseAnalyzer(boolean) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
-
Create a new SmartChineseAnalyzer, optionally using the default stopword list.
- SmartChineseAnalyzer(CharArraySet) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
-
Create a new SmartChineseAnalyzer, using the provided
Set
of stopwords. - SPACE_LIKE - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
-
Characters that act as a space
- SPACES - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
-
Space-like characters that need to be skipped: such as space, tab, newline, carriage return.
- START_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
- startOffset - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
-
start offset into original sentence
- STRING - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
-
ASCII String
- STRING_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
- SURROGATE - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
-
Surrogate character
U
- Utility - Class in org.apache.lucene.analysis.cn.smart
-
SmartChineseAnalyzer utility constants and methods
- Utility() - Constructor for class org.apache.lucene.analysis.cn.smart.Utility
W
- weight - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
-
word frequency
- wordType - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
-
WordType
of the text - WordType - Class in org.apache.lucene.analysis.cn.smart
-
Internal SmartChineseAnalyzer token type constants
- WordType() - Constructor for class org.apache.lucene.analysis.cn.smart.WordType
All Classes and Interfaces|All Packages|Constant Field Values