Index (Lucene 3.6.1 API)

A C D E F G H I L M N O P R S T U W

A

ANALYSIS_DATA_DIR - Static variable in class org.apache.lucene.analysis.cn.smart.AnalyzerProfile: Global indicating the configured analysis data directory
AnalyzerProfile - Class in org.apache.lucene.analysis.cn.smart: Manages analysis data configuration for SmartChineseAnalyzer SmartChineseAnalyzer has a built-in dictionary and stopword list out-of-box.
AnalyzerProfile() - Constructor for class org.apache.lucene.analysis.cn.smart.AnalyzerProfile

C

charArray - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: Character array containing token text
CharType - Class in org.apache.lucene.analysis.cn.smart: Internal SmartChineseAnalyzer character type constants.
CharType() - Constructor for class org.apache.lucene.analysis.cn.smart.CharType
CHINESE_WORD - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: Chinese Word
COMMON_DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.Utility: Delimiters will be filtered to this character by SegTokenFilter
compareArray(char[], int, char[], int) - Static method in class org.apache.lucene.analysis.cn.smart.Utility: compare two arrays starting at the specified offsets.
compareArrayByPrefix(char[], int, char[], int) - Static method in class org.apache.lucene.analysis.cn.smart.Utility: Compare two arrays, starting at the specified offsets, but treating shortArray as a prefix to longArray.

D

DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Punctuation Characters
DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: Punctuation Symbol
DIGIT - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Numeric Digits

E

end() - Method in class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
END_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
endOffset - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: end offset into original sentence
equals(Object) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken

F

filter(SegToken) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter: Filter an input SegToken Full-width latin will be converted to half-width, then all latin will be lowercased.
FULLWIDTH_DIGIT - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Full-Width alphanumeric characters
FULLWIDTH_LETTER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Full-Width letters
FULLWIDTH_NUMBER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: Full-Width Alphanumeric
FULLWIDTH_STRING - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: Full-Width String

G

getCharType(char) - Static method in class org.apache.lucene.analysis.cn.smart.Utility: Return the internal CharType constant of a given character.
getDefaultStopSet() - Static method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer: Returns an unmodifiable instance of the default stop-words set.

H

HANZI - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Han Ideographs
hashCode() - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
HHMMSegmenter - Class in org.apache.lucene.analysis.cn.smart.hhmm: Finds the optimal segmentation of a sentence into Chinese words
HHMMSegmenter() - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter

I

incrementToken() - Method in class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
incrementToken() - Method in class org.apache.lucene.analysis.cn.smart.WordTokenFilter
index - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: during segmentation, this is used to store the index of the token in the token list table

L

LETTER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Letters

M

MAX_FREQUENCE - Static variable in class org.apache.lucene.analysis.cn.smart.Utility: Maximum bigram frequency (used in the smoothing function).

N

NUMBER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: ASCII Alphanumeric
NUMBER_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility

O

org.apache.lucene.analysis.cn.smart - package org.apache.lucene.analysis.cn.smart: Analyzer for Simplified Chinese, which indexes words.
org.apache.lucene.analysis.cn.smart.hhmm - package org.apache.lucene.analysis.cn.smart.hhmm: SmartChineseAnalyzer Hidden Markov Model package.
OTHER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Other (not fitting any of the other categories)

P

process(String) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter: Return a list of SegToken representing the best segmentation of a sentence

R

reset() - Method in class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
reset(Reader) - Method in class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
reset() - Method in class org.apache.lucene.analysis.cn.smart.WordTokenFilter
reusableTokenStream(String, Reader) - Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer

S

SegToken - Class in org.apache.lucene.analysis.cn.smart.hhmm: SmartChineseAnalyzer internal token
SegToken(char[], int, int, int, int) - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: Create a new SegToken from a character array.
SegTokenFilter - Class in org.apache.lucene.analysis.cn.smart.hhmm: Filters a SegToken by converting full-width latin to half-width, then lowercasing latin.
SegTokenFilter() - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter
SENTENCE_BEGIN - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: Start of a Sentence
SENTENCE_END - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: End of a Sentence
SentenceTokenizer - Class in org.apache.lucene.analysis.cn.smart: Tokenizes input text into sentences.
SentenceTokenizer(Reader) - Constructor for class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
SentenceTokenizer(AttributeSource, Reader) - Constructor for class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
SentenceTokenizer(AttributeSource.AttributeFactory, Reader) - Constructor for class org.apache.lucene.analysis.cn.smart.SentenceTokenizer
SmartChineseAnalyzer - Class in org.apache.lucene.analysis.cn.smart: SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
SmartChineseAnalyzer(Version) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer: Create a new SmartChineseAnalyzer, using the default stopword list.
SmartChineseAnalyzer(Version, boolean) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer: Create a new SmartChineseAnalyzer, optionally using the default stopword list.
SmartChineseAnalyzer(Version, Set) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer: Create a new SmartChineseAnalyzer, using the provided Set of stopwords.
SPACE_LIKE - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Characters that act as a space
SPACES - Static variable in class org.apache.lucene.analysis.cn.smart.Utility: Space-like characters that need to be skipped: such as space, tab, newline, carriage return.
START_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
startOffset - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: start offset into original sentence
STRING - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: ASCII String
STRING_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility

T

tokenStream(String, Reader) - Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer

U

Utility - Class in org.apache.lucene.analysis.cn.smart: SmartChineseAnalyzer utility constants and methods
Utility() - Constructor for class org.apache.lucene.analysis.cn.smart.Utility

W

weight - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: word frequency
WordTokenFilter - Class in org.apache.lucene.analysis.cn.smart: A TokenFilter that breaks sentences into words.
WordTokenFilter(TokenStream) - Constructor for class org.apache.lucene.analysis.cn.smart.WordTokenFilter: Construct a new WordTokenizer.
wordType - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: WordType of the text
WordType - Class in org.apache.lucene.analysis.cn.smart: Internal SmartChineseAnalyzer token type constants
WordType() - Constructor for class org.apache.lucene.analysis.cn.smart.WordType

A C D E F G H I L M N O P R S T U W