public abstract class TokenStream extends AttributeSource implements Closeable
TokenStreamenumerates the sequence of tokens, either from
Fields of a
Documentor from query text.
This is an abstract class; concrete subclasses are:
TokenStreamwhose input is a Reader; and
TokenStreamwhose input is another
TokenStreamAPI has been introduced with Lucene 2.9. This API has moved from being
Tokenstill exists in 2.9 as a convenience class, the preferred way to store the information of a
Tokenis to use
TokenStream now extends
AttributeSource, which provides
access to all of the token
Attributes for the
Note that only one instance per
AttributeImpl is created and reused
for every token. This approach reduces object creation and allows local
caching of references to the
incrementToken() for further details.
The workflow of the new
TokenStream API is as follows:
TokenFilters which add/get attributes to/from the
incrementToken()until it returns false consuming the attributes after each call.
end()so that any end-of-stream operations can be performed.
close()to release any resource when finished using the
You can find some example code for the new API in the analysis package level Javadoc.
Sometimes it is desirable to capture a current state of a
e.g., for buffering purposes (see
TeeSinkTokenFilter). For this usecase
can be used.
TokenStream-API in Lucene is based on the decorator pattern.
Therefore all non-abstract subclasses must be final or have at least a final
incrementToken()! This is checked when Java
assertions are enabled.
|Modifier||Constructor and Description|
A TokenStream using the default attribute factory.
A TokenStream using the supplied AttributeFactory for creating new
A TokenStream that uses the same attributes as the supplied one.
|Modifier and Type||Method and Description|
Releases resources associated with this stream.
This method is called by the consumer after the last token has been consumed, after
Resets this stream to the beginning.
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
protected TokenStream(AttributeSource input)
public abstract boolean incrementToken() throws IOException
IndexWriter) use this method to advance the stream to the next token. Implementing classes must implement this method and update the appropriate
AttributeImpls with the attributes of the next token.
The producer must make no assumptions about the attributes after the method
has been returned: the caller may arbitrarily change it. If the producer
needs to preserve the state for subsequent calls, it can use
AttributeSource.captureState() to create a copy of the current attribute state.
This method is called for every token of a document, so an efficient
implementation is crucial for good performance. To avoid calls to
references to all
AttributeImpls that this stream uses should be
retrieved during instantiation.
To ensure that filters and consumers know which attributes are available,
the attributes must be added during instantiation. Filters and consumers
are not required to check for availability of attributes in
public void end() throws IOException
false(using the new
TokenStreamAPI). Streams implementing the old API should upgrade to use this feature. This method can be used to perform any end-of-stream operations, such as setting the final offset of a stream. The final offset of a stream might differ from the offset of the last token eg in case one or more whitespaces followed after the last token, but a
public void reset() throws IOException
reset()is not needed for the standard indexing process. However, if the tokens of a
TokenStreamare intended to be consumed more than once, it is necessary to implement
reset(). Note that if your TokenStream caches tokens and feeds them back again after a reset, it is imperative that you clone the tokens when you store them away (on the first pass) as well as when you return them (on future passes after