public class PatternTokenizerFactory extends TokenizerFactory
PatternTokenizer
.
This tokenizer uses regex pattern matching to construct distinct tokens
for the input stream. It takes two arguments: "pattern" and "group".
group=-1 (the default) is equivalent to "split". In this case, the tokens will
be equivalent to the output from (without empty tokens):
String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc'the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/> </analyzer> </fieldType>
PatternTokenizer
Modifier and Type | Field and Description |
---|---|
protected int |
group |
static String |
GROUP |
protected Pattern |
pattern |
static String |
PATTERN |
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor and Description |
---|
PatternTokenizerFactory(Map<String,String> args)
Creates a new PatternTokenizerFactory
|
Modifier and Type | Method and Description |
---|---|
PatternTokenizer |
create(AttributeFactory factory,
Reader in)
Split the input using configured pattern
|
availableTokenizers, create, forName, lookupClass, reloadTokenizers
assureMatchVersion, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitFileNames
public static final String PATTERN
public static final String GROUP
protected final Pattern pattern
protected final int group
public PatternTokenizer create(AttributeFactory factory, Reader in)
create
in class TokenizerFactory
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.