PatternTokenizerFactory (Lucene 4.5.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.pattern
Class PatternTokenizerFactory

java.lang.Object
  org.apache.lucene.analysis.util.AbstractAnalysisFactory
      org.apache.lucene.analysis.util.TokenizerFactory
          org.apache.lucene.analysis.pattern.PatternTokenizerFactory

public class PatternTokenizerFactory
extends TokenizerFactory
extends TokenizerFactory

Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".

"pattern" is the regular expression.
"group" says which group to extract into tokens.

group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String.split(java.lang.String)

Using group >= 0 selects the matching group as the token. For example, if you have:

  pattern = \'([^\']+)\'
  group = 0
  input = aaa 'bbb' 'ccc'

the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.

 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
   </analyzer>
 </fieldType>

Since:: solr1.2
See Also:: PatternTokenizer

Field Summary
`protected int`	`group`
`static String`	`GROUP`
`protected Pattern`	`pattern`
`static String`	`PATTERN`

Fields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
`LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion`

Constructor Summary
`PatternTokenizerFactory(Map<String,String> args)` Creates a new PatternTokenizerFactory

Method Summary
`PatternTokenizer`	`create(AttributeSource.AttributeFactory factory, Reader in)` Split the input using configured pattern

Methods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
`availableTokenizers, create, forName, lookupClass, reloadTokenizers`

Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
`assureMatchVersion, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitFileNames`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

PATTERN

public static final String PATTERN

See Also:: Constant Field Values

GROUP

public static final String GROUP

See Also:: Constant Field Values

pattern

protected final Pattern pattern

group

protected final int group

Constructor Detail