public abstract class Tokenizer extends TokenStream
This is an abstract class; subclasses must override TokenStream.incrementToken()
NOTE: Subclasses overriding TokenStream.incrementToken() must
call AttributeSource.clearAttributes() before
setting attributes.
AttributeSource.AttributeFactory, AttributeSource.State| Modifier and Type | Field and Description |
|---|---|
protected Reader |
input
The text source for this Tokenizer.
|
| Modifier | Constructor and Description |
|---|---|
protected |
Tokenizer()
Deprecated.
use
Tokenizer(Reader) instead. |
protected |
Tokenizer(AttributeSource.AttributeFactory factory)
Deprecated.
use
Tokenizer(AttributeSource.AttributeFactory, Reader) instead. |
protected |
Tokenizer(AttributeSource.AttributeFactory factory,
Reader input)
Construct a token stream processing the given input using the given AttributeFactory.
|
protected |
Tokenizer(AttributeSource source)
Deprecated.
use
Tokenizer(AttributeSource, Reader) instead. |
protected |
Tokenizer(AttributeSource source,
Reader input)
Construct a token stream processing the given input using the given AttributeSource.
|
protected |
Tokenizer(Reader input)
Construct a token stream processing the given input.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close()
By default, closes the input Reader.
|
protected int |
correctOffset(int currentOff)
Return the corrected offset.
|
void |
reset(Reader input)
Expert: Reset the tokenizer to a new reader.
|
end, incrementToken, resetaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toStringprotected Reader input
@Deprecated protected Tokenizer()
Tokenizer(Reader) instead.protected Tokenizer(Reader input)
@Deprecated protected Tokenizer(AttributeSource.AttributeFactory factory)
Tokenizer(AttributeSource.AttributeFactory, Reader) instead.protected Tokenizer(AttributeSource.AttributeFactory factory, Reader input)
@Deprecated protected Tokenizer(AttributeSource source)
Tokenizer(AttributeSource, Reader) instead.protected Tokenizer(AttributeSource source, Reader input)
public void close()
throws IOException
close in interface Closeableclose in class TokenStreamIOExceptionprotected final int correctOffset(int currentOff)
input is a CharStream subclass
this method calls CharStream.correctOffset(int), else returns currentOff.currentOff - offset as seen in the outputCharStream.correctOffset(int)public void reset(Reader input) throws IOException
IOException