org.apache.lucene.codecs
Class BlockTreeTermsReader

java.lang.Object
  extended by org.apache.lucene.index.Fields
      extended by org.apache.lucene.codecs.FieldsProducer
          extended by org.apache.lucene.codecs.BlockTreeTermsReader
All Implemented Interfaces:
Closeable, Iterable<String>

public class BlockTreeTermsReader
extends FieldsProducer

A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The advantage of this approach is that seekExact is often able to determine a term cannot exist without doing any IO, and intersection with Automata is very fast. Note that this terms dictionary has it's own fixed terms index (ie, it does not support a pluggable terms index implementation).

NOTE: this terms dictionary does not support index divisor when opening an IndexReader. Instead, you can change the min/maxItemsPerBlock during indexing.

The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.

Use CheckIndex with the -verbose option to see summary statistics on the blocks in the dictionary. See BlockTreeTermsWriter.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
 class BlockTreeTermsReader.FieldReader
          BlockTree's implementation of Terms.
static class BlockTreeTermsReader.Stats
          BlockTree statistics for a single field returned by BlockTreeTermsReader.FieldReader.computeStats().
 
Field Summary
 
Fields inherited from class org.apache.lucene.index.Fields
EMPTY_ARRAY
 
Constructor Summary
BlockTreeTermsReader(Directory dir, FieldInfos fieldInfos, SegmentInfo info, PostingsReaderBase postingsReader, IOContext ioContext, String segmentSuffix, int indexDivisor)
          Sole constructor.
 
Method Summary
 void close()
           
 Iterator<String> iterator()
          Returns an iterator that will step through all fields names.
 long ramBytesUsed()
          Returns approximate RAM bytes used
protected  int readHeader(IndexInput input)
          Reads terms file header.
protected  int readIndexHeader(IndexInput input)
          Reads index file header.
protected  void seekDir(IndexInput input, long dirOffset)
          Seek input to the directory offset.
 int size()
          Returns the number of fields or -1 if the number of distinct field names is unknown.
 Terms terms(String field)
          Get the Terms for this field.
 
Methods inherited from class org.apache.lucene.index.Fields
getUniqueTermCount
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BlockTreeTermsReader

public BlockTreeTermsReader(Directory dir,
                            FieldInfos fieldInfos,
                            SegmentInfo info,
                            PostingsReaderBase postingsReader,
                            IOContext ioContext,
                            String segmentSuffix,
                            int indexDivisor)
                     throws IOException
Sole constructor.

Throws:
IOException
Method Detail

readHeader

protected int readHeader(IndexInput input)
                  throws IOException
Reads terms file header.

Throws:
IOException

readIndexHeader

protected int readIndexHeader(IndexInput input)
                       throws IOException
Reads index file header.

Throws:
IOException

seekDir

protected void seekDir(IndexInput input,
                       long dirOffset)
                throws IOException
Seek input to the directory offset.

Throws:
IOException

close

public void close()
           throws IOException
Specified by:
close in interface Closeable
Specified by:
close in class FieldsProducer
Throws:
IOException

iterator

public Iterator<String> iterator()
Description copied from class: Fields
Returns an iterator that will step through all fields names. This will not return null.

Specified by:
iterator in interface Iterable<String>
Specified by:
iterator in class Fields

terms

public Terms terms(String field)
            throws IOException
Description copied from class: Fields
Get the Terms for this field. This will return null if the field does not exist.

Specified by:
terms in class Fields
Throws:
IOException

size

public int size()
Description copied from class: Fields
Returns the number of fields or -1 if the number of distinct field names is unknown. If >= 0, Fields.iterator() will return as many field names.

Specified by:
size in class Fields

ramBytesUsed

public long ramBytesUsed()
Description copied from class: FieldsProducer
Returns approximate RAM bytes used

Specified by:
ramBytesUsed in class FieldsProducer


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.