public class ICUTokenizerFactory extends TokenizerFactory implements ResourceLoaderAware
ICUTokenizer.
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the DefaultICUTokenizerConfig.
To use the default set of per-script rules:
<fieldType name="text_icu" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory"/>
</analyzer>
</fieldType>
You can customize this tokenizer's behavior by specifying per-script rule files,
which are compiled by the ICU RuleBasedBreakIterator. See the
ICU RuleBasedBreakIterator syntax reference.
To add per-script rules, add a "rulefiles" argument, which should contain a
comma-separated list of code:rulefile pairs in the following format:
four-letter ISO 15924 script code, followed by a colon, then a resource
path. E.g. to specify rules for Latin (script code "Latn") and Cyrillic
(script code "Cyrl"):
<fieldType name="text_icu_custom" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory"
rulefiles="Latn:my.Latin.rules.rbbi,Cyrl:my.Cyrillic.rules.rbbi"/>
</analyzer>
</fieldType>args, luceneMatchVersion| Constructor and Description |
|---|
ICUTokenizerFactory()
Sole constructor.
|
| Modifier and Type | Method and Description |
|---|---|
Tokenizer |
create(Reader input) |
void |
inform(ResourceLoader loader) |
void |
init(Map<String,String> args) |
availableTokenizers, forName, lookupClass, reloadTokenizersassureMatchVersion, getArgs, getBoolean, getBoolean, getInt, getInt, getInt, getLines, getLuceneMatchVersion, getPattern, getSnowballWordSet, getWordSet, setLuceneMatchVersion, splitFileNamespublic ICUTokenizerFactory()
AbstractAnalysisFactory for initialization lifecycle.public void init(Map<String,String> args)
init in class AbstractAnalysisFactorypublic void inform(ResourceLoader loader) throws IOException
inform in interface ResourceLoaderAwareIOExceptionpublic Tokenizer create(Reader input)
create in class TokenizerFactoryCopyright © 2000-2013 Apache Software Foundation. All Rights Reserved.