Class BlockTreeTermsReader
- java.lang.Object
-
- org.apache.lucene.index.Fields
-
- org.apache.lucene.codecs.FieldsProducer
-
- org.apache.lucene.codecs.blocktree.BlockTreeTermsReader
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Iterable<String>
,Accountable
public final class BlockTreeTermsReader extends FieldsProducer
A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The advantage of this approach is that seekExact is often able to determine a term cannot exist without doing any IO, and intersection with Automata is very fast. Note that this terms dictionary has its own fixed terms index (ie, it does not support a pluggable terms index implementation).NOTE: this terms dictionary supports min/maxItemsPerBlock during indexing to control how much memory the terms index uses.
The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.
Use
CheckIndex
with the-verbose
option to see summary statistics on the blocks in the dictionary. SeeBlockTreeTermsWriter
.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BlockTreeTermsReader.FSTLoadMode
An enum that allows to control if term index FSTs are loaded into memory or read off-heap
-
Field Summary
Fields Modifier and Type Field Description static String
FST_MODE_KEY
Attribute key for fst mode.static int
VERSION_AUTO_PREFIX_TERMS_REMOVED
Auto-prefix terms have been superseded by points.static int
VERSION_COMPRESSED_SUFFIXES
Suffixes are compressed to save space.static int
VERSION_CURRENT
Current terms format.static int
VERSION_META_LONGS_REMOVED
The long[] + byte[] metadata has been replaced with a single byte[].static int
VERSION_START
Initial terms format.-
Fields inherited from class org.apache.lucene.index.Fields
EMPTY_ARRAY
-
-
Constructor Summary
Constructors Constructor Description BlockTreeTermsReader(PostingsReaderBase postingsReader, SegmentReadState state, BlockTreeTermsReader.FSTLoadMode defaultLoadMode)
Sole constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
checkIntegrity()
Checks consistency of this reader.void
close()
Collection<Accountable>
getChildResources()
Returns nested resources of this class.Iterator<String>
iterator()
Returns an iterator that will step through all fields names.long
ramBytesUsed()
Return the memory usage of this object in bytes.int
size()
Returns the number of fields or -1 if the number of distinct field names is unknown.Terms
terms(String field)
Get theTerms
for this field.String
toString()
-
Methods inherited from class org.apache.lucene.codecs.FieldsProducer
getMergeInstance
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Field Detail
-
FST_MODE_KEY
public static final String FST_MODE_KEY
Attribute key for fst mode.- See Also:
- Constant Field Values
-
VERSION_START
public static final int VERSION_START
Initial terms format.- See Also:
- Constant Field Values
-
VERSION_AUTO_PREFIX_TERMS_REMOVED
public static final int VERSION_AUTO_PREFIX_TERMS_REMOVED
Auto-prefix terms have been superseded by points.- See Also:
- Constant Field Values
-
VERSION_META_LONGS_REMOVED
public static final int VERSION_META_LONGS_REMOVED
The long[] + byte[] metadata has been replaced with a single byte[].- See Also:
- Constant Field Values
-
VERSION_COMPRESSED_SUFFIXES
public static final int VERSION_COMPRESSED_SUFFIXES
Suffixes are compressed to save space.- See Also:
- Constant Field Values
-
VERSION_CURRENT
public static final int VERSION_CURRENT
Current terms format.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
BlockTreeTermsReader
public BlockTreeTermsReader(PostingsReaderBase postingsReader, SegmentReadState state, BlockTreeTermsReader.FSTLoadMode defaultLoadMode) throws IOException
Sole constructor.- Throws:
IOException
-
-
Method Detail
-
close
public void close() throws IOException
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in classFieldsProducer
- Throws:
IOException
-
iterator
public Iterator<String> iterator()
Description copied from class:Fields
Returns an iterator that will step through all fields names. This will not return null.
-
terms
public Terms terms(String field) throws IOException
Description copied from class:Fields
Get theTerms
for this field. This will return null if the field does not exist.- Specified by:
terms
in classFields
- Throws:
IOException
-
size
public int size()
Description copied from class:Fields
Returns the number of fields or -1 if the number of distinct field names is unknown. If >= 0,Fields.iterator()
will return as many field names.
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.
-
getChildResources
public Collection<Accountable> getChildResources()
Description copied from interface:Accountable
Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- See Also:
Accountables
-
checkIntegrity
public void checkIntegrity() throws IOException
Description copied from class:FieldsProducer
Checks consistency of this reader.Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files.
- Specified by:
checkIntegrity
in classFieldsProducer
- Throws:
IOException
-
-