java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.Tokenizer
    - - org.apache.lucene.analysis.icu.segmentation.ICUTokenizer

All Implemented Interfaces:

Closeable, AutoCloseable
```
public final class ICUTokenizer
extends Tokenizer
```
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the ICUTokenizerConfig

See Also:

ICUTokenizerConfig

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.Tokenizer
  input
- Fields inherited from class org.apache.lucene.analysis.TokenStream
  DEFAULT_TOKEN_ATTRIBUTE_FACTORY

Constructor Summary

Constructors
Constructor	Description
`ICUTokenizer()`	Construct a new ICUTokenizer that breaks text into words from the given Reader.
`ICUTokenizer(ICUTokenizerConfig config)`	Construct a new ICUTokenizer that breaks text into words from the given Reader, using a tailored BreakIterator configuration.
`ICUTokenizer(AttributeFactory factory, ICUTokenizerConfig config)`	Construct a new ICUTokenizer that breaks text into words from the given Reader, using a tailored BreakIterator configuration.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type Method Description

void end()

boolean incrementToken()

void reset()
- Methods inherited from class org.apache.lucene.analysis.Tokenizer
  close, correctOffset, setReader
- Methods inherited from class org.apache.lucene.util.AttributeSource
  addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
- Methods inherited from class java.lang.Object
  clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - ICUTokenizer
```
public ICUTokenizer()
```
    Construct a new ICUTokenizer that breaks text into words from the given Reader.
    The default script-specific handling is used.
    The default attribute factory is used.
    
    See Also:
    
    DefaultICUTokenizerConfig
  - ICUTokenizer
```
public ICUTokenizer(ICUTokenizerConfig config)
```
    Construct a new ICUTokenizer that breaks text into words from the given Reader, using a tailored BreakIterator configuration.
    The default attribute factory is used.
    
    Parameters:
    
    config - Tailored BreakIterator configuration
  - ICUTokenizer
```
public ICUTokenizer(AttributeFactory factory,
                    ICUTokenizerConfig config)
```
    Construct a new ICUTokenizer that breaks text into words from the given Reader, using a tailored BreakIterator configuration.
    
    Parameters:
    
    factory - AttributeFactory to use
    
    config - Tailored BreakIterator configuration
- Method Detail
  - incrementToken
```
public boolean incrementToken()
                       throws IOException
```
    Specified by:
    
    incrementToken in class TokenStream
    
    Throws:
    
    IOException
  - reset
```
public void reset()
           throws IOException
```
    Overrides:
    
    reset in class Tokenizer
    
    Throws:
    
    IOException
  - end
```
public void end()
         throws IOException
```
    Overrides:
    
    end in class TokenStream
    
    Throws:
    
    IOException

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`end()`
`boolean`	`incrementToken()`
`void`	`reset()`

Class ICUTokenizer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

Field Summary

Fields inherited from class org.apache.lucene.analysis.Tokenizer

Fields inherited from class org.apache.lucene.analysis.TokenStream

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.Tokenizer

Methods inherited from class org.apache.lucene.util.AttributeSource

Methods inherited from class java.lang.Object

Constructor Detail

ICUTokenizer

ICUTokenizer

ICUTokenizer

Method Detail

incrementToken

reset

end