Class HistogramCollectorManager

java.lang.Object
org.apache.lucene.sandbox.facet.plain.histograms.HistogramCollectorManager
All Implemented Interfaces:
CollectorManager<org.apache.lucene.sandbox.facet.plain.histograms.HistogramCollector,LongIntHashMap>

public final class HistogramCollectorManager extends Object implements CollectorManager<org.apache.lucene.sandbox.facet.plain.histograms.HistogramCollector,LongIntHashMap>
CollectorManager that computes a histogram of the distribution of the values of a field.

It takes an bucketWidth as a parameter and counts the number of documents that fall into intervals [0, bucketWidth), [bucketWidth, 2*bucketWidth), etc. The keys of the returned LongIntHashMap identify these intervals as the quotient of the integer division by bucketWidth. Said otherwise, a key equal to k maps to values in the interval [k * bucketWidth, (k+1) * bucketWidth).

This implementation is optimized for the case when field is part of the index sort and has a skip index.

Note: this collector is inspired from "YU, Muzhi, LIN, Zhaoxiang, SUN, Jinan, et al. TencentCLS: the cloud log service with high query performances. Proceedings of the VLDB Endowment, 2022, vol. 15, no 12, p. 3472-3482.", where the authors describe how they run "histogram queries" by sorting the index by timestamp and pre-computing ranges of doc IDs for every possible bucket.

  • Constructor Details

    • HistogramCollectorManager

      public HistogramCollectorManager(String field, long bucketWidth)
      Compute a histogram of the distribution of the values of the given field according to the given bucketWidth. This configures a maximum number of buckets equal to the default of 1024.
    • HistogramCollectorManager

      public HistogramCollectorManager(String field, long bucketWidth, int maxBuckets)
      Expert constructor.
      Parameters:
      maxBuckets - Max allowed number of buckets. Note that this is checked at runtime and on a best-effort basis.
  • Method Details