ICUTokenizer (Lucene 3.6.1 API)

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.Tokenizer
    - - org.apache.lucene.analysis.icu.segmentation.ICUTokenizer

All Implemented Interfaces:

Closeable
```
public final class ICUTokenizer
extends org.apache.lucene.analysis.Tokenizer
```
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the ICUTokenizerConfig

See Also:
ICUTokenizerConfig
WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.Tokenizer
  input

Constructor Summary

Constructors
Constructor and Description
`ICUTokenizer(Reader input)` Construct a new ICUTokenizer that breaks text into words from the given Reader.
`ICUTokenizer(Reader input, ICUTokenizerConfig config)` Construct a new ICUTokenizer that breaks text into words from the given Reader, using a tailored BreakIterator configuration.

Method Summary

Methods
Modifier and Type Method and Description

void end()

boolean incrementToken()

void reset()

void reset(Reader input)
- Methods inherited from class org.apache.lucene.analysis.Tokenizer
  close, correctOffset
- Methods inherited from class org.apache.lucene.util.AttributeSource
  addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
- Methods inherited from class java.lang.Object
  clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Methods
Modifier and Type	Method and Description
`void`	`end()`
`boolean`	`incrementToken()`
`void`	`reset()`
`void`	`reset(Reader input)`

- Constructor Detail
  - ICUTokenizer
```
public ICUTokenizer(Reader input)
```
    Construct a new ICUTokenizer that breaks text into words from the given Reader.
    The default script-specific handling is used.
    
    Parameters:
    input - Reader containing text to tokenize.
    See Also:
    DefaultICUTokenizerConfig
  - ICUTokenizer
```
public ICUTokenizer(Reader input,
            ICUTokenizerConfig config)
```
    Construct a new ICUTokenizer that breaks text into words from the given Reader, using a tailored BreakIterator configuration.
    
    Parameters:
    input - Reader containing text to tokenize.
    config - Tailored BreakIterator configuration
- Method Detail
  - incrementToken
```
public boolean incrementToken()
                       throws IOException
```
    Specified by:
    
    incrementToken in class org.apache.lucene.analysis.TokenStream
    
    Throws:
    
    IOException
  - reset
```
public void reset()
           throws IOException
```
    Overrides:
    
    reset in class org.apache.lucene.analysis.TokenStream
    
    Throws:
    
    IOException
  - reset
```
public void reset(Reader input)
           throws IOException
```
    Overrides:
    
    reset in class org.apache.lucene.analysis.Tokenizer
    
    Throws:
    
    IOException
  - end
```
public void end()
         throws IOException
```
    Overrides:
    
    end in class org.apache.lucene.analysis.TokenStream
    
    Throws:
    
    IOException

Class ICUTokenizer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

Field Summary

Fields inherited from class org.apache.lucene.analysis.Tokenizer

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.Tokenizer

Methods inherited from class org.apache.lucene.util.AttributeSource

Methods inherited from class java.lang.Object

Constructor Detail

ICUTokenizer

ICUTokenizer

Method Detail

incrementToken

reset

reset

end