org.apache.lucene.analysis
Class CharStream

java.lang.Object
  extended by java.io.Reader
      extended by org.apache.lucene.analysis.CharStream
All Implemented Interfaces:
Closeable, Readable
Direct Known Subclasses:
CharFilter, CharReader

public abstract class CharStream
extends Reader

CharStream adds correctOffset(int) functionality over Reader. All Tokenizers accept a CharStream instead of Reader as input, which enables arbitrary character based filtering before tokenization. The correctOffset(int) method fixed offsets to account for removal or insertion of characters, so that the offsets reported in the tokens match the character offsets of the original Reader.


Field Summary
 
Fields inherited from class java.io.Reader
lock
 
Constructor Summary
CharStream()
           
 
Method Summary
abstract  int correctOffset(int currentOff)
          Called by CharFilter(s) and Tokenizer to correct token offset.
 
Methods inherited from class java.io.Reader
close, mark, markSupported, read, read, read, read, ready, reset, skip
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CharStream

public CharStream()
Method Detail

correctOffset

public abstract int correctOffset(int currentOff)
Called by CharFilter(s) and Tokenizer to correct token offset.

Parameters:
currentOff - offset as seen in the output
Returns:
corrected offset based on the input


Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.