PatternTokenizer (Lucene 4.0.0 API)

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.Tokenizer
    - - org.apache.lucene.analysis.pattern.PatternTokenizer

All Implemented Interfaces:

Closeable
```
public final class PatternTokenizer
extends Tokenizer
```
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".
- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String.split(java.lang.String)

Using group >= 0 selects the matching group as the token. For example, if you have:
```
  pattern = \'([^\']+)\'
  group = 0
  input = aaa 'bbb' 'ccc'
```
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.
See Also:
Pattern

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.AttributeFactory, AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.Tokenizer
  input

Constructor Summary

Constructors
Constructor and Description
`PatternTokenizer(Reader input, Pattern pattern, int group)` creates a new PatternTokenizer returning tokens from group (-1 for split functionality)

Method Summary

Methods
Modifier and Type Method and Description

void end()

boolean incrementToken()

void reset()
- Methods inherited from class org.apache.lucene.analysis.Tokenizer
  close, correctOffset, setReader
- Methods inherited from class org.apache.lucene.util.AttributeSource
  addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState
- Methods inherited from class java.lang.Object
  clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - PatternTokenizer
```
public PatternTokenizer(Reader input,
                Pattern pattern,
                int group)
                 throws IOException
```
    creates a new PatternTokenizer returning tokens from group (-1 for split functionality)
    
    Throws:
    
    IOException
- Method Detail
  - incrementToken
```
public boolean incrementToken()
```
    Specified by:
    
    incrementToken in class TokenStream
  - end
```
public void end()
```
    Overrides:
    
    end in class TokenStream
  - reset
```
public void reset()
           throws IOException
```
    Overrides:
    
    reset in class TokenStream
    
    Throws:
    
    IOException

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.