org.apache.lucene.analysis.pattern
Class PatternTokenizerFactory
java.lang.Object
org.apache.lucene.analysis.util.AbstractAnalysisFactory
org.apache.lucene.analysis.util.TokenizerFactory
org.apache.lucene.analysis.pattern.PatternTokenizerFactory
public class PatternTokenizerFactory
- extends TokenizerFactory
Factory for PatternTokenizer
.
This tokenizer uses regex pattern matching to construct distinct tokens
for the input stream. It takes two arguments: "pattern" and "group".
- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will
be equivalent to the output from (without empty tokens):
String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\'
group = 0
input = aaa 'bbb' 'ccc'
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input
but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
</analyzer>
</fieldType>
- Since:
- solr1.2
- See Also:
PatternTokenizer
Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory |
assureMatchVersion, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitFileNames |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PATTERN
public static final String PATTERN
- See Also:
- Constant Field Values
GROUP
public static final String GROUP
- See Also:
- Constant Field Values
pattern
protected final Pattern pattern
group
protected final int group
PatternTokenizerFactory
public PatternTokenizerFactory(Map<String,String> args)
- Creates a new PatternTokenizerFactory
create
public PatternTokenizer create(AttributeSource.AttributeFactory factory,
Reader in)
- Split the input using configured pattern
- Specified by:
create
in class TokenizerFactory
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.