ICUTokenizerFactory (Lucene 7.7.0 API)

java.lang.Object
- org.apache.lucene.analysis.util.AbstractAnalysisFactory
- - org.apache.lucene.analysis.util.TokenizerFactory
  - - org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory

All Implemented Interfaces:

ResourceLoaderAware
```
public class ICUTokenizerFactory
extends TokenizerFactory
implements ResourceLoaderAware
```
Factory for ICUTokenizer. Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the DefaultICUTokenizerConfig.
To use the default set of per-script rules:
```
 <fieldType name="text_icu" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.ICUTokenizerFactory"/>
   </analyzer>
 </fieldType>
```
You can customize this tokenizer's behavior by specifying per-script rule files, which are compiled by the ICU RuleBasedBreakIterator. See the ICU RuleBasedBreakIterator syntax reference.
To add per-script rules, add a "rulefiles" argument, which should contain a comma-separated list of code:rulefile pairs in the following format: four-letter ISO 15924 script code, followed by a colon, then a resource path. E.g. to specify rules for Latin (script code "Latn") and Cyrillic (script code "Cyrl"):
```
 <fieldType name="text_icu_custom" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.ICUTokenizerFactory" cjkAsWords="true"
                rulefiles="Latn:my.Latin.rules.rbbi,Cyrl:my.Cyrillic.rules.rbbi"/>
   </analyzer>
 </fieldType>
```
Since:

3.1

- Field Summary
  - Fields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
    LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
- Constructor Summary
  
  Constructors
  Constructor Description
  
  ICUTokenizerFactory(Map<String,String> args)
  Creates a new ICUTokenizerFactory
- Method Summary
  
  All Methods Instance Methods Concrete Methods
  Modifier and Type Method Description
  
  ICUTokenizer create(AttributeFactory factory)
  
  void inform(ResourceLoader loader)
  - Methods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
    availableTokenizers, create, forName, lookupClass, reloadTokenizers
  - Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
    get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
  - Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ICUTokenizerFactory
```
public ICUTokenizerFactory(Map<String,String> args)
```
    Creates a new ICUTokenizerFactory
- Method Detail
  - inform
```
public void inform(ResourceLoader loader)
            throws IOException
```
    Specified by:
    
    inform in interface ResourceLoaderAware
    
    Throws:
    
    IOException
  - create
```
public ICUTokenizer create(AttributeFactory factory)
```
    Specified by:
    
    create in class TokenizerFactory