org.apache.lucene.analysis
Class CharStream
java.lang.Object
java.io.Reader
org.apache.lucene.analysis.CharStream
- All Implemented Interfaces:
- Closeable, Readable
- Direct Known Subclasses:
- CharFilter, CharReader
public abstract class CharStream
- extends Reader
CharStream adds correctOffset(int)
functionality over Reader
. All Tokenizers accept a
CharStream instead of Reader
as input, which enables
arbitrary character based filtering before tokenization.
The correctOffset(int)
method fixed offsets to account for
removal or insertion of characters, so that the offsets
reported in the tokens match the character offsets of the
original Reader.
Method Summary |
abstract int |
correctOffset(int currentOff)
Called by CharFilter(s) and Tokenizer to correct token offset. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CharStream
public CharStream()
correctOffset
public abstract int correctOffset(int currentOff)
- Called by CharFilter(s) and Tokenizer to correct token offset.
- Parameters:
currentOff
- offset as seen in the output
- Returns:
- corrected offset based on the input
Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.