Index

A C D E F G H I L M N O P R S U W 
All Classes and Interfaces|All Packages|Constant Field Values

A

ANALYSIS_DATA_DIR - Static variable in class org.apache.lucene.analysis.cn.smart.AnalyzerProfile
Global indicating the configured analysis data directory
AnalyzerProfile - Class in org.apache.lucene.analysis.cn.smart
Manages analysis data configuration for SmartChineseAnalyzer
AnalyzerProfile() - Constructor for class org.apache.lucene.analysis.cn.smart.AnalyzerProfile
 

C

charArray - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
Character array containing token text
CharType - Class in org.apache.lucene.analysis.cn.smart
Internal SmartChineseAnalyzer character type constants.
CharType() - Constructor for class org.apache.lucene.analysis.cn.smart.CharType
 
CHINESE_WORD - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
Chinese Word
COMMON_DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
Delimiters will be filtered to this character by SegTokenFilter
compareArray(char[], int, char[], int) - Static method in class org.apache.lucene.analysis.cn.smart.Utility
compare two arrays starting at the specified offsets.
compareArrayByPrefix(char[], int, char[], int) - Static method in class org.apache.lucene.analysis.cn.smart.Utility
Compare two arrays, starting at the specified offsets, but treating shortArray as a prefix to longArray.
create(AttributeFactory) - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
 
createComponents(String) - Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
 

D

DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Punctuation Characters
DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
Punctuation Symbol
DIGIT - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Numeric Digits

E

END_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
 
endOffset - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
end offset into original sentence
equals(Object) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
 

F

filter(SegToken) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter
Filter an input SegToken
FULLWIDTH_DIGIT - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Full-Width alphanumeric characters
FULLWIDTH_LETTER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Full-Width letters
FULLWIDTH_NUMBER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
Full-Width Alphanumeric
FULLWIDTH_STRING - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
Full-Width String

G

getCharType(char) - Static method in class org.apache.lucene.analysis.cn.smart.Utility
Return the internal CharType constant of a given character.
getDefaultStopSet() - Static method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
Returns an unmodifiable instance of the default stop-words set.

H

HANZI - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Han Ideographs
hashCode() - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
 
HHMMSegmenter - Class in org.apache.lucene.analysis.cn.smart.hhmm
Finds the optimal segmentation of a sentence into Chinese words
HHMMSegmenter() - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter
 
HMMChineseTokenizer - Class in org.apache.lucene.analysis.cn.smart
Tokenizer for Chinese or mixed Chinese-English text.
HMMChineseTokenizer() - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
Creates a new HMMChineseTokenizer
HMMChineseTokenizer(AttributeFactory) - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
Creates a new HMMChineseTokenizer, supplying the AttributeFactory
HMMChineseTokenizerFactory - Class in org.apache.lucene.analysis.cn.smart
HMMChineseTokenizerFactory() - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
Default ctor for compatibility with SPI
HMMChineseTokenizerFactory(Map<String, String>) - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
Creates a new HMMChineseTokenizerFactory

I

incrementWord() - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
 
index - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
during segmentation, this is used to store the index of the token in the token list table

L

LETTER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Letters

M

MAX_FREQUENCE - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
Maximum bigram frequency (used in the smoothing function).

N

NAME - Static variable in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
SPI name
normalize(String, TokenStream) - Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
 
NUMBER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
ASCII Alphanumeric
NUMBER_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
 

O

org.apache.lucene.analysis.cn.smart - package org.apache.lucene.analysis.cn.smart
Analyzer for Simplified Chinese, which indexes words.
org.apache.lucene.analysis.cn.smart.hhmm - package org.apache.lucene.analysis.cn.smart.hhmm
SmartChineseAnalyzer Hidden Markov Model package.
OTHER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Other (not fitting any of the other categories)

P

process(String) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter
Return a list of SegToken representing the best segmentation of a sentence

R

reset() - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
 

S

SegToken - Class in org.apache.lucene.analysis.cn.smart.hhmm
SmartChineseAnalyzer internal token
SegToken(char[], int, int, int, int) - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
Create a new SegToken from a character array.
SegTokenFilter - Class in org.apache.lucene.analysis.cn.smart.hhmm
Filters a SegToken by converting full-width latin to half-width, then lowercasing latin.
SegTokenFilter() - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter
 
SENTENCE_BEGIN - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
Start of a Sentence
SENTENCE_END - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
End of a Sentence
setNextSentence(int, int) - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
 
SmartChineseAnalyzer - Class in org.apache.lucene.analysis.cn.smart
SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
SmartChineseAnalyzer() - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
Create a new SmartChineseAnalyzer, using the default stopword list.
SmartChineseAnalyzer(boolean) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
Create a new SmartChineseAnalyzer, optionally using the default stopword list.
SmartChineseAnalyzer(CharArraySet) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
Create a new SmartChineseAnalyzer, using the provided Set of stopwords.
SPACE_LIKE - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Characters that act as a space
SPACES - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
Space-like characters that need to be skipped: such as space, tab, newline, carriage return.
START_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
 
startOffset - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
start offset into original sentence
STRING - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
ASCII String
STRING_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
 
SURROGATE - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Surrogate character

U

Utility - Class in org.apache.lucene.analysis.cn.smart
SmartChineseAnalyzer utility constants and methods
Utility() - Constructor for class org.apache.lucene.analysis.cn.smart.Utility
 

W

weight - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
word frequency
wordType - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
WordType of the text
WordType - Class in org.apache.lucene.analysis.cn.smart
Internal SmartChineseAnalyzer token type constants
WordType() - Constructor for class org.apache.lucene.analysis.cn.smart.WordType
 
A C D E F G H I L M N O P R S U W 
All Classes and Interfaces|All Packages|Constant Field Values