Class PatternTokenizerFactory
java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenizerFactory
org.apache.lucene.analysis.pattern.PatternTokenizerFactory
Factory for
PatternTokenizer
. This tokenizer uses regex pattern matching to construct
distinct tokens for the input stream. It takes two arguments: "pattern" and "group". - "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent
to the output from (without empty tokens): String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc'the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
<fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/> </analyzer> </fieldType>
- Since:
- solr1.2
- See Also:
- SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
- "pattern"
-
Field Summary
Modifier and TypeFieldDescriptionprotected final int
static final String
static final String
SPI nameprotected final Pattern
static final String
Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
Constructor Summary
ConstructorDescriptionDefault ctor for compatibility with SPIPatternTokenizerFactory
(Map<String, String> args) Creates a new PatternTokenizerFactory -
Method Summary
Modifier and TypeMethodDescriptioncreate
(AttributeFactory factory) Split the input using configured patternMethods inherited from class org.apache.lucene.analysis.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
NAME
SPI name- See Also:
-
PATTERN
- See Also:
-
GROUP
- See Also:
-
pattern
-
group
protected final int group
-
-
Constructor Details
-
PatternTokenizerFactory
Creates a new PatternTokenizerFactory -
PatternTokenizerFactory
public PatternTokenizerFactory()Default ctor for compatibility with SPI
-
-
Method Details
-
create
Split the input using configured pattern- Specified by:
create
in classTokenizerFactory
-