Index (Lucene 10.1.0 smartcn API)

A C D E F G H I L M N O P R S U W
All Classes and Interfaces|All Packages|Constant Field Values

A

ANALYSIS_DATA_DIR - Static variable in class org.apache.lucene.analysis.cn.smart.AnalyzerProfile: Global indicating the configured analysis data directory
AnalyzerProfile - Class in org.apache.lucene.analysis.cn.smart: Manages analysis data configuration for SmartChineseAnalyzer
AnalyzerProfile() - Constructor for class org.apache.lucene.analysis.cn.smart.AnalyzerProfile

C

charArray - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: Character array containing token text
CharType - Class in org.apache.lucene.analysis.cn.smart: Internal SmartChineseAnalyzer character type constants.
CharType() - Constructor for class org.apache.lucene.analysis.cn.smart.CharType
CHINESE_WORD - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: Chinese Word
COMMON_DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.Utility: Delimiters will be filtered to this character by SegTokenFilter
compareArray(char[], int, char[], int) - Static method in class org.apache.lucene.analysis.cn.smart.Utility: compare two arrays starting at the specified offsets.
compareArrayByPrefix(char[], int, char[], int) - Static method in class org.apache.lucene.analysis.cn.smart.Utility: Compare two arrays, starting at the specified offsets, but treating shortArray as a prefix to longArray.
create(AttributeFactory) - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
createComponents(String) - Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer

D

DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Punctuation Characters
DELIMITER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: Punctuation Symbol
DIGIT - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Numeric Digits

E

END_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
endOffset - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: end offset into original sentence
equals(Object) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken

F

filter(SegToken) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter: Filter an input SegToken
FULLWIDTH_DIGIT - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Full-Width alphanumeric characters
FULLWIDTH_LETTER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Full-Width letters
FULLWIDTH_NUMBER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: Full-Width Alphanumeric
FULLWIDTH_STRING - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: Full-Width String

G

getCharType(char) - Static method in class org.apache.lucene.analysis.cn.smart.Utility: Return the internal CharType constant of a given character.
getDefaultStopSet() - Static method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer: Returns an unmodifiable instance of the default stop-words set.

H

HANZI - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Han Ideographs
hashCode() - Method in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken
HHMMSegmenter - Class in org.apache.lucene.analysis.cn.smart.hhmm: Finds the optimal segmentation of a sentence into Chinese words
HHMMSegmenter() - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter
HMMChineseTokenizer - Class in org.apache.lucene.analysis.cn.smart: Tokenizer for Chinese or mixed Chinese-English text.
HMMChineseTokenizer() - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer: Creates a new HMMChineseTokenizer
HMMChineseTokenizer(AttributeFactory) - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer: Creates a new HMMChineseTokenizer, supplying the AttributeFactory
HMMChineseTokenizerFactory - Class in org.apache.lucene.analysis.cn.smart: Factory for HMMChineseTokenizer
HMMChineseTokenizerFactory() - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory: Default ctor for compatibility with SPI
HMMChineseTokenizerFactory(Map<String, String>) - Constructor for class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory: Creates a new HMMChineseTokenizerFactory

I

incrementWord() - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
index - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: during segmentation, this is used to store the index of the token in the token list table

L

LETTER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Letters

M

MAX_FREQUENCE - Static variable in class org.apache.lucene.analysis.cn.smart.Utility: Maximum bigram frequency (used in the smoothing function).

N

NAME - Static variable in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory: SPI name
normalize(String, TokenStream) - Method in class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer
NUMBER - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: ASCII Alphanumeric
NUMBER_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility

O

org.apache.lucene.analysis.cn.smart - package org.apache.lucene.analysis.cn.smart: Analyzer for Simplified Chinese, which indexes words.
org.apache.lucene.analysis.cn.smart.hhmm - package org.apache.lucene.analysis.cn.smart.hhmm: SmartChineseAnalyzer Hidden Markov Model package.
OTHER - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Other (not fitting any of the other categories)

P

process(String) - Method in class org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter: Return a list of SegToken representing the best segmentation of a sentence

R

reset() - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer

S

SegToken - Class in org.apache.lucene.analysis.cn.smart.hhmm: SmartChineseAnalyzer internal token
SegToken(char[], int, int, int, int) - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: Create a new SegToken from a character array.
SegTokenFilter - Class in org.apache.lucene.analysis.cn.smart.hhmm: Filters a SegToken by converting full-width latin to half-width, then lowercasing latin.
SegTokenFilter() - Constructor for class org.apache.lucene.analysis.cn.smart.hhmm.SegTokenFilter
SENTENCE_BEGIN - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: Start of a Sentence
SENTENCE_END - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: End of a Sentence
setNextSentence(int, int) - Method in class org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer
SmartChineseAnalyzer - Class in org.apache.lucene.analysis.cn.smart: SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
SmartChineseAnalyzer() - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer: Create a new SmartChineseAnalyzer, using the default stopword list.
SmartChineseAnalyzer(boolean) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer: Create a new SmartChineseAnalyzer, optionally using the default stopword list.
SmartChineseAnalyzer(CharArraySet) - Constructor for class org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer: Create a new SmartChineseAnalyzer, using the provided Set of stopwords.
SPACE_LIKE - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Characters that act as a space
SPACES - Static variable in class org.apache.lucene.analysis.cn.smart.Utility: Space-like characters that need to be skipped: such as space, tab, newline, carriage return.
START_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
startOffset - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: start offset into original sentence
STRING - Static variable in class org.apache.lucene.analysis.cn.smart.WordType: ASCII String
STRING_CHAR_ARRAY - Static variable in class org.apache.lucene.analysis.cn.smart.Utility
SURROGATE - Static variable in class org.apache.lucene.analysis.cn.smart.CharType: Surrogate character

U

Utility - Class in org.apache.lucene.analysis.cn.smart: SmartChineseAnalyzer utility constants and methods
Utility() - Constructor for class org.apache.lucene.analysis.cn.smart.Utility

W

weight - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: word frequency
wordType - Variable in class org.apache.lucene.analysis.cn.smart.hhmm.SegToken: WordType of the text
WordType - Class in org.apache.lucene.analysis.cn.smart: Internal SmartChineseAnalyzer token type constants
WordType() - Constructor for class org.apache.lucene.analysis.cn.smart.WordType

A C D E F G H I L M N O P R S U W
All Classes and Interfaces|All Packages|Constant Field Values