org.apache.lucene.analysis
Class Tokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
All Implemented Interfaces:
Closeable

public abstract class Tokenizer
extends TokenStream

A Tokenizer is a TokenStream whose input is a Reader.

This is an abstract class; subclasses must override TokenStream.incrementToken()

NOTE: Subclasses overriding TokenStream.incrementToken() must call AttributeSource.clearAttributes() before setting attributes.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
 
Field Summary
protected  Reader input
          The text source for this Tokenizer.
 
Constructor Summary
protected Tokenizer(AttributeSource.AttributeFactory factory, Reader input)
          Construct a token stream processing the given input using the given AttributeFactory.
protected Tokenizer(AttributeSource source, Reader input)
          Construct a token stream processing the given input using the given AttributeSource.
protected Tokenizer(Reader input)
          Construct a token stream processing the given input.
 
Method Summary
 void close()
          Releases resources associated with this stream.
protected  int correctOffset(int currentOff)
          Return the corrected offset.
 void setReader(Reader input)
          Expert: Set a new reader on the Tokenizer.
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
end, incrementToken, reset
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

input

protected Reader input
The text source for this Tokenizer.

Constructor Detail

Tokenizer

protected Tokenizer(Reader input)
Construct a token stream processing the given input.


Tokenizer

protected Tokenizer(AttributeSource.AttributeFactory factory,
                    Reader input)
Construct a token stream processing the given input using the given AttributeFactory.


Tokenizer

protected Tokenizer(AttributeSource source,
                    Reader input)
Construct a token stream processing the given input using the given AttributeSource.

Method Detail

close

public void close()
           throws IOException
Releases resources associated with this stream.

NOTE: The default implementation closes the input Reader, so be sure to call super.close() when overriding this method.

Specified by:
close in interface Closeable
Overrides:
close in class TokenStream
Throws:
IOException

correctOffset

protected final int correctOffset(int currentOff)
Return the corrected offset. If input is a CharFilter subclass this method calls CharFilter.correctOffset(int), else returns currentOff.

Parameters:
currentOff - offset as seen in the output
Returns:
corrected offset based on the input
See Also:
CharFilter.correctOffset(int)

setReader

public final void setReader(Reader input)
                     throws IOException
Expert: Set a new reader on the Tokenizer. Typically, an analyzer (in its tokenStream method) will use this to re-use a previously created tokenizer.

Throws:
IOException


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.