Interface TermToBytesRefAttribute

  • All Superinterfaces:
    Attribute
    All Known Subinterfaces:
    BytesTermAttribute
    All Known Implementing Classes:
    BytesTermAttributeImpl, CharTermAttributeImpl, PackedTokenAttributeImpl

    public interface TermToBytesRefAttribute
    extends Attribute
    This attribute is requested by TermsHashPerField to index the contents. This attribute can be used to customize the final byte[] encoding of terms.

    Consumers of this attribute call getBytesRef() for each term. Example:

       final TermToBytesRefAttribute termAtt = tokenStream.getAttribute(TermToBytesRefAttribute.class);
    
       while (tokenStream.incrementToken() {
         final BytesRef bytes = termAtt.getBytesRef();
    
         if (isInteresting(bytes)) {
    
           // because the bytes are reused by the attribute (like CharTermAttribute's char[] buffer),
           // you should make a copy if you need persistent access to the bytes, otherwise they will
           // be rewritten across calls to incrementToken()
    
           doSomethingWith(BytesRef.deepCopyOf(bytes));
         }
       }
       ...
     
    NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
    This is a very expert and internal API, please use CharTermAttribute and its implementation for UTF-8 terms; to index binary terms, use BytesTermAttribute and its implementation.
    • Method Detail

      • getBytesRef

        BytesRef getBytesRef()
        Retrieve this attribute's BytesRef. The bytes are updated from the current term. The implementation may return a new instance or keep the previous one.
        Returns:
        a BytesRef to be indexed (only stays valid until token stream gets incremented)