PatternTokenizerFactory (Lucene 9.0.0 common API)

java.lang.Object
- org.apache.lucene.analysis.AbstractAnalysisFactory
- - org.apache.lucene.analysis.TokenizerFactory
  - - org.apache.lucene.analysis.pattern.PatternTokenizerFactory

```
public class PatternTokenizerFactory
extends TokenizerFactory
```
Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".
- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
```
  pattern = \'([^\']+)\'
  group = 0
  input = aaa 'bbb' 'ccc'
 
```
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
```
 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
   </analyzer>
 </fieldType>
```
Since:

solr1.2

See Also:

PatternTokenizer

SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).

"pattern"

Field Summary

Fields
Modifier and Type Field Description

protected int group

static String GROUP

static String NAME
SPI name

protected Pattern pattern

static String PATTERN
- Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
  LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion

Constructor Summary

Constructors
Constructor	Description
`PatternTokenizerFactory()`	Default ctor for compatibility with SPI
`PatternTokenizerFactory(Map<String,String> args)`	Creates a new PatternTokenizerFactory

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type Method Description

PatternTokenizer create(AttributeFactory factory)
Split the input using configured pattern
- Methods inherited from class org.apache.lucene.analysis.TokenizerFactory
  availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
- Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
  defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
- Methods inherited from class java.lang.Object
  clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - NAME
```
public static final String NAME
```
    SPI name
    
    See Also:
    
    Constant Field Values
  - PATTERN
```
public static final String PATTERN
```
    See Also:
    
    Constant Field Values
  - GROUP
```
public static final String GROUP
```
    See Also:
    
    Constant Field Values
  - pattern
```
protected final Pattern pattern
```
  - group
```
protected final int group
```
- Constructor Detail
  - PatternTokenizerFactory
```
public PatternTokenizerFactory(Map<String,String> args)
```
    Creates a new PatternTokenizerFactory
  - PatternTokenizerFactory
```
public PatternTokenizerFactory()
```
    Default ctor for compatibility with SPI
- Method Detail
  - create
```
public PatternTokenizer create(AttributeFactory factory)
```
    Split the input using configured pattern
    
    Specified by:
    
    create in class TokenizerFactory