Class BloomFilteringPostingsFormat

java.lang.Object
org.apache.lucene.codecs.PostingsFormat
org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat
All Implemented Interfaces:
NamedSPILoader.NamedSPI

public final class BloomFilteringPostingsFormat extends PostingsFormat
A PostingsFormat useful for low doc-frequency fields such as primary keys. Bloom filters are maintained in a ".blm" file which offers "fast-fail" for reads in segments known to have no record of the key. A choice of delegate PostingsFormat is used to record all other Postings data.

A choice of BloomFilterFactory can be passed to tailor Bloom Filter settings on a per-field basis. The default configuration is DefaultBloomFilterFactory which allocates a ~8mb bitset and hashes values using MurmurHash2. This should be suitable for most purposes.

The format of the blm file is as follows:

  • BloomFilter (.blm) --> Header, DelegatePostingsFormatName, NumFilteredFields, FilterNumFilteredFields, Footer
  • Filter --> FieldNumber, FuzzySet
  • FuzzySet -->See FuzzySet.serialize(DataOutput)
  • Header --> IndexHeader
  • DelegatePostingsFormatName --> String The name of a ServiceProvider registered PostingsFormat
  • NumFilteredFields --> Uint32
  • FieldNumber --> Uint32 The number of the field in this segment
  • Footer --> CodecFooter
WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Field Details

  • Constructor Details

    • BloomFilteringPostingsFormat

      public BloomFilteringPostingsFormat(PostingsFormat delegatePostingsFormat, BloomFilterFactory bloomFilterFactory)
      Creates Bloom filters for a selection of fields created in the index. This is recorded as a set of Bitsets held as a segment summary in an additional "blm" file. This PostingsFormat delegates to a choice of delegate PostingsFormat for encoding all other postings data.
      Parameters:
      delegatePostingsFormat - The PostingsFormat that records all the non-bloom filter data i.e. postings info.
      bloomFilterFactory - The BloomFilterFactory responsible for sizing BloomFilters appropriately
    • BloomFilteringPostingsFormat

      public BloomFilteringPostingsFormat(PostingsFormat delegatePostingsFormat)
      Creates Bloom filters for a selection of fields created in the index. This is recorded as a set of Bitsets held as a segment summary in an additional "blm" file. This PostingsFormat delegates to a choice of delegate PostingsFormat for encoding all other postings data. This choice of constructor defaults to the DefaultBloomFilterFactory for configuring per-field BloomFilters.
      Parameters:
      delegatePostingsFormat - The PostingsFormat that records all the non-bloom filter data i.e. postings info.
    • BloomFilteringPostingsFormat

      public BloomFilteringPostingsFormat()
  • Method Details