org.apache.lucene.codecs.bloom
Class BloomFilteringPostingsFormat
java.lang.Object
org.apache.lucene.codecs.PostingsFormat
org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat
- All Implemented Interfaces:
- NamedSPILoader.NamedSPI
public final class BloomFilteringPostingsFormat
- extends PostingsFormat
A PostingsFormat
useful for low doc-frequency fields such as primary
keys. Bloom filters are maintained in a ".blm" file which offers "fast-fail"
for reads in segments known to have no record of the key. A choice of
delegate PostingsFormat is used to record all other Postings data.
A choice of BloomFilterFactory
can be passed to tailor Bloom Filter
settings on a per-field basis. The default configuration is
DefaultBloomFilterFactory
which allocates a ~8mb bitset and hashes
values using MurmurHash2
. This should be suitable for most purposes.
The format of the blm file is as follows:
- BloomFilter (.blm) --> Header, DelegatePostingsFormatName,
NumFilteredFields, FilterNumFilteredFields
- Filter --> FieldNumber, FuzzySet
- FuzzySet -->See
FuzzySet.serialize(DataOutput)
- Header -->
CodecHeader
- DelegatePostingsFormatName -->
String
The name of a ServiceProvider registered PostingsFormat
- NumFilteredFields -->
Uint32
- FieldNumber -->
Uint32
The number of the
field in this segment
- WARNING: This API is experimental and might change in incompatible ways in the next release.
BLOOM_CODEC_NAME
public static final String BLOOM_CODEC_NAME
- See Also:
- Constant Field Values
BLOOM_CODEC_VERSION
public static final int BLOOM_CODEC_VERSION
- See Also:
- Constant Field Values
BloomFilteringPostingsFormat
public BloomFilteringPostingsFormat(PostingsFormat delegatePostingsFormat,
BloomFilterFactory bloomFilterFactory)
- Creates Bloom filters for a selection of fields created in the index. This
is recorded as a set of Bitsets held as a segment summary in an additional
"blm" file. This PostingsFormat delegates to a choice of delegate
PostingsFormat for encoding all other postings data.
- Parameters:
delegatePostingsFormat
- The PostingsFormat that records all the non-bloom filter data i.e.
postings info.bloomFilterFactory
- The BloomFilterFactory
responsible for sizing BloomFilters
appropriately
BloomFilteringPostingsFormat
public BloomFilteringPostingsFormat(PostingsFormat delegatePostingsFormat)
- Creates Bloom filters for a selection of fields created in the index. This
is recorded as a set of Bitsets held as a segment summary in an additional
"blm" file. This PostingsFormat delegates to a choice of delegate
PostingsFormat for encoding all other postings data. This choice of
constructor defaults to the
DefaultBloomFilterFactory
for
configuring per-field BloomFilters.
- Parameters:
delegatePostingsFormat
- The PostingsFormat that records all the non-bloom filter data i.e.
postings info.
BloomFilteringPostingsFormat
public BloomFilteringPostingsFormat()
fieldsConsumer
public FieldsConsumer fieldsConsumer(SegmentWriteState state)
throws IOException
- Specified by:
fieldsConsumer
in class PostingsFormat
- Throws:
IOException
fieldsProducer
public FieldsProducer fieldsProducer(SegmentReadState state)
throws IOException
- Specified by:
fieldsProducer
in class PostingsFormat
- Throws:
IOException
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.