HMMChineseTokenizer (Lucene 5.5.0 API)

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.Tokenizer
    - - org.apache.lucene.analysis.util.SegmentingTokenizerBase
      - org.apache.lucene.analysis.cn.smart.HMMChineseTokenizer

All Implemented Interfaces:

Closeable, AutoCloseable
```
public class HMMChineseTokenizer
extends SegmentingTokenizerBase
```
Tokenizer for Chinese or mixed Chinese-English text.
The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.util.SegmentingTokenizerBase
  buffer, BUFFERMAX, offset
- Fields inherited from class org.apache.lucene.analysis.Tokenizer
  input
- Fields inherited from class org.apache.lucene.analysis.TokenStream
  DEFAULT_TOKEN_ATTRIBUTE_FACTORY

Constructor Summary

Constructors
Constructor and Description
`HMMChineseTokenizer()` Creates a new HMMChineseTokenizer
`HMMChineseTokenizer(AttributeFactory factory)` Creates a new HMMChineseTokenizer, supplying the AttributeFactory

Method Summary

Methods
Modifier and Type Method and Description

protected boolean incrementWord()

void reset()

protected void setNextSentence(int sentenceStart, int sentenceEnd)
- Methods inherited from class org.apache.lucene.analysis.util.SegmentingTokenizerBase
  end, incrementToken, isSafeEnd
- Methods inherited from class org.apache.lucene.analysis.Tokenizer
  close, correctOffset, setReader
- Methods inherited from class org.apache.lucene.util.AttributeSource
  addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
- Methods inherited from class java.lang.Object
  clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - HMMChineseTokenizer
```
public HMMChineseTokenizer()
```
    Creates a new HMMChineseTokenizer
  - HMMChineseTokenizer
```
public HMMChineseTokenizer(AttributeFactory factory)
```
    Creates a new HMMChineseTokenizer, supplying the AttributeFactory
- Method Detail
  - setNextSentence
```
protected void setNextSentence(int sentenceStart,
                   int sentenceEnd)
```
    Specified by:
    
    setNextSentence in class SegmentingTokenizerBase
  - incrementWord
```
protected boolean incrementWord()
```
    Specified by:
    
    incrementWord in class SegmentingTokenizerBase
  - reset
```
public void reset()
           throws IOException
```
    Overrides:
    
    reset in class SegmentingTokenizerBase
    
    Throws:
    
    IOException

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2016 Apache Software Foundation. All Rights Reserved.