org.apache.lucene.search.highlight.TokenStreamFromTermVector

All Implemented Interfaces:: Closeable, AutoCloseable

public final class TokenStreamFromTermVector extends TokenStream

TokenStream created from a term vector field. The term vector requires positions and/or offsets (either). If you want payloads add PayloadAttributeImpl (as you would normally) but don't assume the attribute is already added just because you know the term vector has payloads, since the first call to incrementToken() will observe if you asked for them and if not then won't get them. This TokenStream supports an efficient reset(), so there's no need to wrap with a caching impl.

The implementation will create an array of tokens indexed by token position. As long as there aren't massive jumps in positions, this is fine. And it assumes there aren't large numbers of tokens at the same position, since it adds them to a linked-list per position in O(N^2) complexity. When there aren't positions in the term vector, it divides the startOffset by 8 to use as a temporary substitute. In that case, tokens with the same startOffset will occupy the same final position; otherwise tokens become adjacent.

NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
Field Summary

Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor Summary

Constructors

Constructor

Description

TokenStreamFromTermVector(Terms vector, int maxStartOffset)

Constructor.
Method Summary

Modifier and Type

Method

Description

Terms

getTermVectorTerms()

boolean

incrementToken()

void

reset()

Methods inherited from class org.apache.lucene.analysis.TokenStream
close, end

Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Details
- TokenStreamFromTermVector
  
  public TokenStreamFromTermVector(Terms vector, int maxStartOffset) throws IOException
  
  Constructor. The uninversion doesn't happen here; it's delayed till the first call to incrementToken.
  
  Parameters:
  
  vector - Terms that contains the data for creating the TokenStream. Must have positions and/or offsets.
  
  maxStartOffset - if a token's start offset exceeds this then the token is not added. -1 disables the limit.
  
  Throws:
  
  IOException
Method Details
- getTermVectorTerms
  
  public Terms getTermVectorTerms()
- reset
  
  public void reset() throws IOException
  
  Overrides:
  
  reset in class TokenStream
  
  Throws:
  
  IOException
- incrementToken
  
  public boolean incrementToken() throws IOException
  
  Specified by:
  
  incrementToken in class TokenStream
  
  Throws:
  
  IOException

Class TokenStreamFromTermVector

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

Field Summary

Fields inherited from class org.apache.lucene.analysis.TokenStream

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.TokenStream

Methods inherited from class org.apache.lucene.util.AttributeSource

Methods inherited from class java.lang.Object

Constructor Details

TokenStreamFromTermVector

Method Details

getTermVectorTerms

reset

incrementToken