org.apache.lucene.analysis.tokenattributes
Interface TermToBytesRefAttribute

All Superinterfaces:
Attribute
All Known Implementing Classes:
CharTermAttributeImpl, NumericTokenStream.NumericTermAttributeImpl, Token

public interface TermToBytesRefAttribute
extends Attribute

This attribute is requested by TermsHashPerField to index the contents. This attribute can be used to customize the final byte[] encoding of terms.

Consumers of this attribute call getBytesRef() up-front, and then invoke fillBytesRef() for each term. Example:

   final TermToBytesRefAttribute termAtt = tokenStream.getAttribute(TermToBytesRefAttribute.class);
   final BytesRef bytes = termAtt.getBytesRef();

   while (termAtt.incrementToken() {

     // you must call termAtt.fillBytesRef() before doing something with the bytes.
     // this encodes the term value (internally it might be a char[], etc) into the bytes.
     int hashCode = termAtt.fillBytesRef();

     if (isInteresting(bytes)) {
     
       // because the bytes are reused by the attribute (like CharTermAttribute's char[] buffer),
       // you should make a copy if you need persistent access to the bytes, otherwise they will
       // be rewritten across calls to incrementToken()

       doSomethingWith(new BytesRef(bytes));
     }
   }
   ...
 

WARNING: This API is experimental and might change in incompatible ways in the next release.
This is a very expert API, please use CharTermAttributeImpl and its implementation of this method for UTF-8 terms.

Method Summary
 int fillBytesRef()
          Updates the bytes getBytesRef() to contain this term's final encoding, and returns its hashcode.
 BytesRef getBytesRef()
          Retrieve this attribute's BytesRef.
 

Method Detail

fillBytesRef

int fillBytesRef()
Updates the bytes getBytesRef() to contain this term's final encoding, and returns its hashcode.

Returns:
the hashcode as defined by BytesRef.hashCode():
  int hash = 0;
  for (int i = termBytes.offset; i < termBytes.offset+termBytes.length; i++) {
    hash = 31*hash + termBytes.bytes[i];
  }
 
Implement this for performance reasons, if your code can calculate the hash on-the-fly. If this is not the case, just return termBytes.hashCode().

getBytesRef

BytesRef getBytesRef()
Retrieve this attribute's BytesRef. The bytes are updated from the current term when the consumer calls fillBytesRef().

Returns:
this Attributes internal BytesRef.


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.