SimplePatternSplitTokenizerFactory (Lucene 9.9.2 common API)

java.lang.Object
- org.apache.lucene.analysis.AbstractAnalysisFactory
- - org.apache.lucene.analysis.TokenizerFactory
  - - org.apache.lucene.analysis.pattern.SimplePatternSplitTokenizerFactory

```
public class SimplePatternSplitTokenizerFactory
extends TokenizerFactory
```
Factory for SimplePatternSplitTokenizer, for producing tokens by splitting according to the provided regexp.
This tokenizer uses Lucene RegExp pattern matching to construct distinct tokens for the input stream. The syntax is more limited than PatternTokenizer, but the tokenization is quite a bit faster. It takes two arguments:
- "pattern" (required) is the regular expression, according to the syntax described at RegExp
- "determinizeWorkLimit" (optional, default Operations.DEFAULT_DETERMINIZE_WORK_LIMIT) the limit on total effort to determinize the automaton computed from the regexp
The pattern matches the characters that should split tokens, like String.split, and the matching is greedy such that the longest token separator matching at a given point is matched. Empty tokens are never created.
For example, to match tokens delimited by simple whitespace characters:
```
 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.SimplePatternSplitTokenizerFactory" pattern="[ \t\r\n]+"/>
   </analyzer>
 </fieldType>
```
Since:

6.5.0

See Also:

SimplePatternSplitTokenizer

WARNING: This API is experimental and might change in incompatible ways in the next release.

SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).

"simplePatternSplit"

Field Summary

Fields
Modifier and Type Field Description

static String NAME
SPI name

static String PATTERN
- Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
  LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion

Constructor Summary

Constructors
Constructor	Description
`SimplePatternSplitTokenizerFactory()`	Default ctor for compatibility with SPI
`SimplePatternSplitTokenizerFactory(Map<String,String> args)`	Creates a new SimpleSplitPatternTokenizerFactory

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type Method Description

SimplePatternSplitTokenizer create(AttributeFactory factory)
- Methods inherited from class org.apache.lucene.analysis.TokenizerFactory
  availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
- Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
  defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
- Methods inherited from class java.lang.Object
  clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - NAME
```
public static final String NAME
```
    SPI name
    
    See Also:
    
    Constant Field Values
  - PATTERN
```
public static final String PATTERN
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - SimplePatternSplitTokenizerFactory
```
public SimplePatternSplitTokenizerFactory(Map<String,String> args)
```
    Creates a new SimpleSplitPatternTokenizerFactory
  - SimplePatternSplitTokenizerFactory
```
public SimplePatternSplitTokenizerFactory()
```
    Default ctor for compatibility with SPI
- Method Detail
  - create
```
public SimplePatternSplitTokenizer create(AttributeFactory factory)
```
    Specified by:
    
    create in class TokenizerFactory