org.apache.lucene.analysis.pattern
Class PatternTokenizerFactory

java.lang.Object
  extended by org.apache.lucene.analysis.util.AbstractAnalysisFactory
      extended by org.apache.lucene.analysis.util.TokenizerFactory
          extended by org.apache.lucene.analysis.pattern.PatternTokenizerFactory

public class PatternTokenizerFactory
extends TokenizerFactory

Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String.split(java.lang.String)

Using group >= 0 selects the matching group as the token. For example, if you have:

  pattern = \'([^\']+)\'
  group = 0
  input = aaa 'bbb' 'ccc'
 
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
   </analyzer>
 </fieldType>

Since:
solr1.2
See Also:
PatternTokenizer

Field Summary
protected  int group
           
static String GROUP
           
protected  Pattern pattern
           
static String PATTERN
           
 
Fields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
 
Constructor Summary
PatternTokenizerFactory(Map<String,String> args)
          Creates a new PatternTokenizerFactory
 
Method Summary
 PatternTokenizer create(AttributeSource.AttributeFactory factory, Reader in)
          Split the input using configured pattern
 
Methods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
availableTokenizers, create, forName, lookupClass, reloadTokenizers
 
Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
assureMatchVersion, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitFileNames
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PATTERN

public static final String PATTERN
See Also:
Constant Field Values

GROUP

public static final String GROUP
See Also:
Constant Field Values

pattern

protected final Pattern pattern

group

protected final int group
Constructor Detail

PatternTokenizerFactory

public PatternTokenizerFactory(Map<String,String> args)
Creates a new PatternTokenizerFactory

Method Detail

create

public PatternTokenizer create(AttributeSource.AttributeFactory factory,
                               Reader in)
Split the input using configured pattern

Specified by:
create in class TokenizerFactory


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.