public class SimplePatternSplitTokenizerFactory extends TokenizerFactory
SimplePatternSplitTokenizer, for producing tokens by splitting according to the provided regexp.
This tokenizer uses Lucene RegExp pattern matching to construct distinct tokens
for the input stream. The syntax is more limited than PatternTokenizer, but the
tokenization is quite a bit faster. It takes two arguments:
RegExp
The pattern matches the characters that should split tokens, like String.split, and the
matching is greedy such that the longest token separator matching at a given point is matched. Empty
tokens are never created.
For example, to match tokens delimited by simple whitespace characters:
<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.SimplePatternSplitTokenizerFactory" pattern="[ \t\r\n]+"/>
</analyzer>
</fieldType>SimplePatternSplitTokenizer| Modifier and Type | Field and Description |
|---|---|
static String |
NAME
SPI name
|
static String |
PATTERN |
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion| Constructor and Description |
|---|
SimplePatternSplitTokenizerFactory(Map<String,String> args)
Creates a new SimpleSplitPatternTokenizerFactory
|
| Modifier and Type | Method and Description |
|---|---|
SimplePatternSplitTokenizer |
create(AttributeFactory factory)
Creates a TokenStream of the specified input using the given AttributeFactory
|
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizersget, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNamespublic static final String NAME
public static final String PATTERN
public SimplePatternSplitTokenizer create(AttributeFactory factory)
TokenizerFactorycreate in class TokenizerFactoryCopyright © 2000-2021 Apache Software Foundation. All Rights Reserved.