Class BloomFilteringPostingsFormat

  • All Implemented Interfaces:
    NamedSPILoader.NamedSPI

    public final class BloomFilteringPostingsFormat
    extends PostingsFormat
    A PostingsFormat useful for low doc-frequency fields such as primary keys. Bloom filters are maintained in a ".blm" file which offers "fast-fail" for reads in segments known to have no record of the key. A choice of delegate PostingsFormat is used to record all other Postings data.

    A choice of BloomFilterFactory can be passed to tailor Bloom Filter settings on a per-field basis. The default configuration is DefaultBloomFilterFactory which allocates a ~8mb bitset and hashes values using MurmurHash2. This should be suitable for most purposes.

    The format of the blm file is as follows:

    • BloomFilter (.blm) --> Header, DelegatePostingsFormatName, NumFilteredFields, FilterNumFilteredFields, Footer
    • Filter --> FieldNumber, FuzzySet
    • FuzzySet -->See FuzzySet.serialize(DataOutput)
    • Header --> IndexHeader
    • DelegatePostingsFormatName --> String The name of a ServiceProvider registered PostingsFormat
    • NumFilteredFields --> Uint32
    • FieldNumber --> Uint32 The number of the field in this segment
    • Footer --> CodecFooter
    WARNING: This API is experimental and might change in incompatible ways in the next release.
    • Constructor Detail

      • BloomFilteringPostingsFormat

        public BloomFilteringPostingsFormat​(PostingsFormat delegatePostingsFormat,
                                            BloomFilterFactory bloomFilterFactory)
        Creates Bloom filters for a selection of fields created in the index. This is recorded as a set of Bitsets held as a segment summary in an additional "blm" file. This PostingsFormat delegates to a choice of delegate PostingsFormat for encoding all other postings data.
        Parameters:
        delegatePostingsFormat - The PostingsFormat that records all the non-bloom filter data i.e. postings info.
        bloomFilterFactory - The BloomFilterFactory responsible for sizing BloomFilters appropriately
      • BloomFilteringPostingsFormat

        public BloomFilteringPostingsFormat​(PostingsFormat delegatePostingsFormat)
        Creates Bloom filters for a selection of fields created in the index. This is recorded as a set of Bitsets held as a segment summary in an additional "blm" file. This PostingsFormat delegates to a choice of delegate PostingsFormat for encoding all other postings data. This choice of constructor defaults to the DefaultBloomFilterFactory for configuring per-field BloomFilters.
        Parameters:
        delegatePostingsFormat - The PostingsFormat that records all the non-bloom filter data i.e. postings info.
      • BloomFilteringPostingsFormat

        public BloomFilteringPostingsFormat()