Package org.apache.lucene.codecs.bloom
Class BloomFilteringPostingsFormat
java.lang.Object
org.apache.lucene.codecs.PostingsFormat
org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat
- All Implemented Interfaces:
NamedSPILoader.NamedSPI
A
PostingsFormat
useful for low doc-frequency fields such as primary keys. Bloom filters
are maintained in a ".blm" file which offers "fast-fail" for reads in segments known to have no
record of the key. A choice of delegate PostingsFormat is used to record all other Postings data.
A choice of BloomFilterFactory
can be passed to tailor Bloom Filter settings on a
per-field basis. The default configuration is DefaultBloomFilterFactory
which allocates a
~8mb bitset and hashes values using MurmurHash2
. This should be suitable for most
purposes.
The format of the blm file is as follows:
- BloomFilter (.blm) --> Header, DelegatePostingsFormatName, NumFilteredFields, FilterNumFilteredFields, Footer
- Filter --> FieldNumber, FuzzySet
- FuzzySet -->See
FuzzySet.serialize(DataOutput)
- Header -->
IndexHeader
- DelegatePostingsFormatName -->
String
The name of a ServiceProvider registeredPostingsFormat
- NumFilteredFields -->
Uint32
- FieldNumber -->
Uint32
The number of the field in this segment - Footer -->
CodecFooter
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Field Summary
Fields inherited from class org.apache.lucene.codecs.PostingsFormat
EMPTY
-
Constructor Summary
ConstructorDescriptionBloomFilteringPostingsFormat
(PostingsFormat delegatePostingsFormat) Creates Bloom filters for a selection of fields created in the index.BloomFilteringPostingsFormat
(PostingsFormat delegatePostingsFormat, BloomFilterFactory bloomFilterFactory) Creates Bloom filters for a selection of fields created in the index. -
Method Summary
Modifier and TypeMethodDescriptionfieldsConsumer
(SegmentWriteState state) fieldsProducer
(SegmentReadState state) toString()
Methods inherited from class org.apache.lucene.codecs.PostingsFormat
availablePostingsFormats, forName, getName, reloadPostingsFormats
-
Field Details
-
BLOOM_CODEC_NAME
- See Also:
-
VERSION_START
public static final int VERSION_START- See Also:
-
VERSION_CURRENT
public static final int VERSION_CURRENT- See Also:
-
-
Constructor Details
-
BloomFilteringPostingsFormat
public BloomFilteringPostingsFormat(PostingsFormat delegatePostingsFormat, BloomFilterFactory bloomFilterFactory) Creates Bloom filters for a selection of fields created in the index. This is recorded as a set of Bitsets held as a segment summary in an additional "blm" file. This PostingsFormat delegates to a choice of delegate PostingsFormat for encoding all other postings data.- Parameters:
delegatePostingsFormat
- The PostingsFormat that records all the non-bloom filter data i.e. postings info.bloomFilterFactory
- TheBloomFilterFactory
responsible for sizing BloomFilters appropriately
-
BloomFilteringPostingsFormat
Creates Bloom filters for a selection of fields created in the index. This is recorded as a set of Bitsets held as a segment summary in an additional "blm" file. This PostingsFormat delegates to a choice of delegate PostingsFormat for encoding all other postings data. This choice of constructor defaults to theDefaultBloomFilterFactory
for configuring per-field BloomFilters.- Parameters:
delegatePostingsFormat
- The PostingsFormat that records all the non-bloom filter data i.e. postings info.
-
BloomFilteringPostingsFormat
public BloomFilteringPostingsFormat()
-
-
Method Details
-
fieldsConsumer
- Specified by:
fieldsConsumer
in classPostingsFormat
- Throws:
IOException
-
fieldsProducer
- Specified by:
fieldsProducer
in classPostingsFormat
- Throws:
IOException
-
toString
- Overrides:
toString
in classPostingsFormat
-