org.apache.lucene.analysis
Class Tokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
All Implemented Interfaces:
Closeable

public abstract class Tokenizer
extends TokenStream

A Tokenizer is a TokenStream whose input is a Reader.

This is an abstract class; subclasses must override TokenStream.incrementToken()

NOTE: Subclasses overriding TokenStream.incrementToken() must call AttributeSource.clearAttributes() before setting attributes.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
 
Field Summary
protected  Reader input
          The text source for this Tokenizer.
 
Constructor Summary
protected Tokenizer(AttributeSource.AttributeFactory factory, Reader input)
          Construct a token stream processing the given input using the given AttributeFactory.
protected Tokenizer(Reader input)
          Construct a token stream processing the given input.
 
Method Summary
 void close()
          Releases resources associated with this stream.
protected  int correctOffset(int currentOff)
          Return the corrected offset.
 void reset()
          This method is called by a consumer before it begins consumption using TokenStream.incrementToken().
 void setReader(Reader input)
          Expert: Set a new reader on the Tokenizer.
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
end, incrementToken
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

input

protected Reader input
The text source for this Tokenizer.

Constructor Detail

Tokenizer

protected Tokenizer(Reader input)
Construct a token stream processing the given input.


Tokenizer

protected Tokenizer(AttributeSource.AttributeFactory factory,
                    Reader input)
Construct a token stream processing the given input using the given AttributeFactory.

Method Detail

close

public void close()
           throws IOException
Releases resources associated with this stream.

If you override this method, always call super.close(), otherwise some internal state will not be correctly reset (e.g., Tokenizer will throw IllegalStateException on reuse).

NOTE: The default implementation closes the input Reader, so be sure to call super.close() when overriding this method.

Specified by:
close in interface Closeable
Overrides:
close in class TokenStream
Throws:
IOException

correctOffset

protected final int correctOffset(int currentOff)
Return the corrected offset. If input is a CharFilter subclass this method calls CharFilter.correctOffset(int), else returns currentOff.

Parameters:
currentOff - offset as seen in the output
Returns:
corrected offset based on the input
See Also:
CharFilter.correctOffset(int)

setReader

public final void setReader(Reader input)
                     throws IOException
Expert: Set a new reader on the Tokenizer. Typically, an analyzer (in its tokenStream method) will use this to re-use a previously created tokenizer.

Throws:
IOException

reset

public void reset()
           throws IOException
Description copied from class: TokenStream
This method is called by a consumer before it begins consumption using TokenStream.incrementToken().

Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.

If you override this method, always call super.reset(), otherwise some internal state will not be correctly reset (e.g., Tokenizer will throw IllegalStateException on further usage).

Overrides:
reset in class TokenStream
Throws:
IOException


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.