ICUTokenizerFactory (Lucene 9.4.2 icu API)

java.lang.Object
- org.apache.lucene.analysis.AbstractAnalysisFactory
- - org.apache.lucene.analysis.TokenizerFactory
  - - org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory

All Implemented Interfaces:

ResourceLoaderAware
```
public class ICUTokenizerFactory
extends TokenizerFactory
implements ResourceLoaderAware
```
Factory for ICUTokenizer. Words are broken across script boundaries, then segmented according to the BreakIterator and typing provided by the DefaultICUTokenizerConfig.
To use the default set of per-script rules:
```
 <fieldType name="text_icu" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.ICUTokenizerFactory"/>
   </analyzer>
 </fieldType>
```
You can customize this tokenizer's behavior by specifying per-script rule files, which are compiled by the ICU RuleBasedBreakIterator. See the ICU RuleBasedBreakIterator syntax reference.
To add per-script rules, add a "rulefiles" argument, which should contain a comma-separated list of code:rulefile pairs in the following format: four-letter ISO 15924 script code, followed by a colon, then a resource path. E.g. to specify rules for Latin (script code "Latn") and Cyrillic (script code "Cyrl"):
```
 <fieldType name="text_icu_custom" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.ICUTokenizerFactory" cjkAsWords="true"
                rulefiles="Latn:my.Latin.rules.rbbi,Cyrl:my.Cyrillic.rules.rbbi"/>
   </analyzer>
 </fieldType>
```
Since:

3.1

SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).

"icu"

- Field Summary
  
  Fields
  Modifier and Type Field Description
  
  static String NAME
  SPI name
  - Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
    LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
- Constructor Summary
  
  Constructors
  Constructor Description
  
  ICUTokenizerFactory()
  Default ctor for compatibility with SPI
  
  ICUTokenizerFactory(Map<String,String> args)
  Creates a new ICUTokenizerFactory
- Method Summary
  
  All Methods Instance Methods Concrete Methods
  Modifier and Type Method Description
  
  ICUTokenizer create(AttributeFactory factory)
  
  void inform(ResourceLoader loader)
  - Methods inherited from class org.apache.lucene.analysis.TokenizerFactory
    availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
  - Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
    defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
  - Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - NAME
```
public static final String NAME
```
    SPI name
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - ICUTokenizerFactory
```
public ICUTokenizerFactory(Map<String,String> args)
```
    Creates a new ICUTokenizerFactory
  - ICUTokenizerFactory
```
public ICUTokenizerFactory()
```
    Default ctor for compatibility with SPI
- Method Detail
  - inform
```
public void inform(ResourceLoader loader)
            throws IOException
```
    Specified by:
    
    inform in interface ResourceLoaderAware
    
    Throws:
    
    IOException
  - create
```
public ICUTokenizer create(AttributeFactory factory)
```
    Specified by:
    
    create in class TokenizerFactory