PatternTokenizerFactory (Lucene 4.0.0 API)

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.analysis.util.AbstractAnalysisFactory
- - org.apache.lucene.analysis.util.TokenizerFactory
  - - org.apache.lucene.analysis.pattern.PatternTokenizerFactory

```
public class PatternTokenizerFactory
extends TokenizerFactory
```
Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".
- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String.split(java.lang.String)

Using group >= 0 selects the matching group as the token. For example, if you have:
```
  pattern = \'([^\']+)\'
  group = 0
  input = aaa 'bbb' 'ccc'
```
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.
```
 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
   </analyzer>
 </fieldType>
```
Since:

solr1.2

See Also:
PatternTokenizer

- Field Summary
  
  Fields
  Modifier and Type Field and Description
  
  protected int group
  
  static String GROUP
  
  protected Pattern pattern
  
  static String PATTERN
  - Fields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
    args, luceneMatchVersion
- Constructor Summary
  
  Constructors
  Constructor and Description
  
  PatternTokenizerFactory()
- Method Summary
  
  Methods
  Modifier and Type Method and Description
  
  Tokenizer create(Reader in)
  Split the input using configured pattern
  
  void init(Map<String,String> args)
  Require a configured pattern
  - Methods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
    availableTokenizers, forName, lookupClass, reloadTokenizers
  - Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
    assureMatchVersion, getArgs, getBoolean, getBoolean, getInt, getInt, getInt, getLines, getLuceneMatchVersion, getPattern, getSnowballWordSet, getWordSet, setLuceneMatchVersion, splitFileNames
  - Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - PATTERN
```
public static final String PATTERN
```
    See Also:
    Constant Field Values
  - GROUP
```
public static final String GROUP
```
    See Also:
    Constant Field Values
  - pattern
```
protected Pattern pattern
```
  - group
```
protected int group
```
- Constructor Detail
  - PatternTokenizerFactory
```
public PatternTokenizerFactory()
```
- Method Detail
  - init
```
public void init(Map<String,String> args)
```
    Require a configured pattern
    
    Overrides:
    
    init in class AbstractAnalysisFactory
  - create
```
public Tokenizer create(Reader in)
```
    Split the input using configured pattern
    
    Specified by:
    
    create in class TokenizerFactory

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.