public final class PatternTokenizer extends Tokenizer
group=-1 (the default) is equivalent to "split". In this case, the tokens will
be equivalent to the output from (without empty tokens):
String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc'the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
Pattern
AttributeSource.AttributeFactory, AttributeSource.State
Constructor and Description |
---|
PatternTokenizer(Reader input,
Pattern pattern,
int group)
creates a new PatternTokenizer returning tokens from group (-1 for split functionality)
|
Modifier and Type | Method and Description |
---|---|
void |
end() |
boolean |
incrementToken() |
void |
reset(Reader input) |
close, correctOffset
reset
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public PatternTokenizer(Reader input, Pattern pattern, int group) throws IOException
IOException
public boolean incrementToken() throws IOException
incrementToken
in class TokenStream
IOException
public void end() throws IOException
end
in class TokenStream
IOException
public void reset(Reader input) throws IOException
reset
in class Tokenizer
IOException