public class ICUTokenizerFactory extends TokenizerFactory implements ResourceLoaderAware
ICUTokenizer
.
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the DefaultICUTokenizerConfig
.
To use the default set of per-script rules:
<fieldType name="text_icu" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.ICUTokenizerFactory"/> </analyzer> </fieldType>You can customize this tokenizer's behavior by specifying per-script rule files, which are compiled by the ICU RuleBasedBreakIterator. See the ICU RuleBasedBreakIterator syntax reference. To add per-script rules, add a "rulefiles" argument, which should contain a comma-separated list of code:rulefile pairs in the following format: four-letter ISO 15924 script code, followed by a colon, then a resource path. E.g. to specify rules for Latin (script code "Latn") and Cyrillic (script code "Cyrl"):
<fieldType name="text_icu_custom" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.ICUTokenizerFactory" rulefiles="Latn:my.Latin.rules.rbbi,Cyrl:my.Cyrillic.rules.rbbi"/> </analyzer> </fieldType>
args, luceneMatchVersion
Constructor and Description |
---|
ICUTokenizerFactory()
Sole constructor.
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
create(Reader input) |
void |
inform(ResourceLoader loader) |
void |
init(Map<String,String> args) |
availableTokenizers, forName, lookupClass, reloadTokenizers
assureMatchVersion, getArgs, getBoolean, getBoolean, getInt, getInt, getInt, getLines, getLuceneMatchVersion, getPattern, getSnowballWordSet, getWordSet, setLuceneMatchVersion, splitFileNames
public ICUTokenizerFactory()
AbstractAnalysisFactory
for initialization lifecycle.public void init(Map<String,String> args)
init
in class AbstractAnalysisFactory
public void inform(ResourceLoader loader) throws IOException
inform
in interface ResourceLoaderAware
IOException
public Tokenizer create(Reader input)
create
in class TokenizerFactory
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.