A C D E F G H I L M N O P R S T U W 

A

ANALYSIS_DATA_DIR - Static variable in class org.apache.lucene.analysis.cn.smart.AnalyzerProfile
Global indicating the configured analysis data directory
AnalyzerProfile - Class in org.apache.lucene.analysis.cn.smart
Manages analysis data configuration for SmartChineseAnalyzer SmartChineseAnalyzer has a built-in dictionary and stopword list out-of-box.
AnalyzerProfile() - Constructor for class org.apache.lucene.analysis.cn.smart.AnalyzerProfile
 

C

charArray - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
Character array containing token text
CharType - Class in org.apache.lucene.analysis.cn.smart
Internal SmartChineseAnalyzer character type constants.
CharType() - Constructor for class org.apache.lucene.analysis.cn.smart.CharType
 
CHINESE_WORD - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
Chinese Word
COMMON_DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
Delimiters will be filtered to this character by SegTokenFilter
compareArray(char[], int, char[], int) - Static method in class org.apache.lucene.analysis.cn.smart.Utility
compare two arrays starting at the specified offsets.
compareArrayByPrefix(char[], int, char[], int) - Static method in class org.apache.lucene.analysis.cn.smart.Utility
Compare two arrays, starting at the specified offsets, but treating shortArray as a prefix to longArray.

D

DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Punctuation Characters
DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
Punctuation Symbol
DIGIT - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Numeric Digits

E

end() - Method in class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
 
END_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
 
endOffset - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
end offset into original sentence
equals(Object) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
 

F

filter(SegToken) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter
Filter an input SegToken Full-width latin will be converted to half-width, then all latin will be lowercased.
FULLWIDTH_DIGIT - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Full-Width alphanumeric characters
FULLWIDTH_LETTER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Full-Width letters
FULLWIDTH_NUMBER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
Full-Width Alphanumeric
FULLWIDTH_STRING - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
Full-Width String

G

getCharType(char) - Static method in class org.apache.lucene.analysis.cn.smart.Utility
Return the internal CharType constant of a given character.
getDefaultStopSet() - Static method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
Returns an unmodifiable instance of the default stop-words set.

H

HANZI - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Han Ideographs
hashCode() - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
 
HHMMSegmenter - Class in org.apache.lucene.analysis.cn.smart.hhmm
Finds the optimal segmentation of a sentence into Chinese words
HHMMSegmenter() - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter
 

I

incrementToken() - Method in class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
 
incrementToken() - Method in class org.apache.lucene.analysis.cn.smart.WordTokenFilter
 
index - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
during segmentation, this is used to store the index of the token in the token list table

L

LETTER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Letters

M

MAX_FREQUENCE - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
Maximum bigram frequency (used in the smoothing function).

N

NUMBER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
ASCII Alphanumeric
NUMBER_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
 

O

org.apache.lucene.analysis.cn.smart - package org.apache.lucene.analysis.cn.smart
Analyzer for Simplified Chinese, which indexes words.
org.apache.lucene.analysis.cn.smart.hhmm - package org.apache.lucene.analysis.cn.smart.hhmm
SmartChineseAnalyzer Hidden Markov Model package.
OTHER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Other (not fitting any of the other categories)

P

process(String) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter
Return a list of SegToken representing the best segmentation of a sentence

R

reset() - Method in class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
 
reset(Reader) - Method in class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
 
reset() - Method in class org.apache.lucene.analysis.cn.smart.WordTokenFilter
 
reusableTokenStream(String, Reader) - Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
 

S

SegToken - Class in org.apache.lucene.analysis.cn.smart.hhmm
SmartChineseAnalyzer internal token
SegToken(char[], int, int, int, int) - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
Create a new SegToken from a character array.
SegTokenFilter - Class in org.apache.lucene.analysis.cn.smart.hhmm
Filters a SegToken by converting full-width latin to half-width, then lowercasing latin.
SegTokenFilter() - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter
 
SENTENCE_BEGIN - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
Start of a Sentence
SENTENCE_END - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
End of a Sentence
SentenceTokenizer - Class in org.apache.lucene.analysis.cn.smart
Tokenizes input text into sentences.
SentenceTokenizer(Reader) - Constructor for class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
 
SentenceTokenizer(AttributeSource, Reader) - Constructor for class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
 
SentenceTokenizer(AttributeSource.AttributeFactory, Reader) - Constructor for class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
 
SmartChineseAnalyzer - Class in org.apache.lucene.analysis.cn.smart
SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
SmartChineseAnalyzer(Version) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
Create a new SmartChineseAnalyzer, using the default stopword list.
SmartChineseAnalyzer(Version, boolean) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
Create a new SmartChineseAnalyzer, optionally using the default stopword list.
SmartChineseAnalyzer(Version, Set) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
Create a new SmartChineseAnalyzer, using the provided Set of stopwords.
SPACE_LIKE - Static variable in class org.apache.lucene.analysis.cn.smart.CharType
Characters that act as a space
SPACES - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
Space-like characters that need to be skipped: such as space, tab, newline, carriage return.
START_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
 
startOffset - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
start offset into original sentence
STRING - Static variable in class org.apache.lucene.analysis.cn.smart.WordType
ASCII String
STRING_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
 

T

tokenStream(String, Reader) - Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
 

U

Utility - Class in org.apache.lucene.analysis.cn.smart
SmartChineseAnalyzer utility constants and methods
Utility() - Constructor for class org.apache.lucene.analysis.cn.smart.Utility
 

W

weight - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
word frequency
WordTokenFilter - Class in org.apache.lucene.analysis.cn.smart
A TokenFilter that breaks sentences into words.
WordTokenFilter(TokenStream) - Constructor for class org.apache.lucene.analysis.cn.smart.WordTokenFilter
Construct a new WordTokenizer.
wordType - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
WordType of the text
WordType - Class in org.apache.lucene.analysis.cn.smart
Internal SmartChineseAnalyzer token type constants
WordType() - Constructor for class org.apache.lucene.analysis.cn.smart.WordType
 
A C D E F G H I L M N O P R S T U W