Class CollectionStatistics


  • public class CollectionStatistics
    extends Object
    Contains statistics for a collection (field).

    This class holds statistics across all documents for scoring purposes:

    The following conditions are always true:

    • All statistics are positive integers: never zero or negative.
    • docCount <= maxDoc
    • docCount <= sumDocFreq <= sumTotalTermFreq

    Values may include statistics on deleted documents that have not yet been merged away.

    Be careful when performing calculations on these values because they are represented as 64-bit integer values, you may need to cast to double for your use.

    WARNING: This API is experimental and might change in incompatible ways in the next release.
    • Constructor Detail

      • CollectionStatistics

        public CollectionStatistics​(String field,
                                    long maxDoc,
                                    long docCount,
                                    long sumTotalTermFreq,
                                    long sumDocFreq)
        Creates statistics instance for a collection (field).
        Parameters:
        field - Field's name
        maxDoc - total number of documents.
        docCount - number of documents containing the field.
        sumTotalTermFreq - number of tokens in the field.
        sumDocFreq - number of postings list entries for the field.
        Throws:
        IllegalArgumentException - if maxDoc is negative or zero.
        IllegalArgumentException - if docCount is negative or zero.
        IllegalArgumentException - if docCount is more than maxDoc.
        IllegalArgumentException - if sumDocFreq is less than docCount.
        IllegalArgumentException - if sumTotalTermFreq is less than sumDocFreq.
    • Method Detail

      • field

        public final String field()
        The field's name.

        This value is never null.

        Returns:
        field's name, not null
      • maxDoc

        public final long maxDoc()
        The total number of documents, regardless of whether they all contain values for this field.

        This value is always a positive number.

        Returns:
        total number of documents, in the range [1 .. Long.MAX_VALUE]
        See Also:
        IndexReader.maxDoc()
      • docCount

        public final long docCount()
        The total number of documents that have at least one term for this field.

        This value is always a positive number, and never exceeds maxDoc().

        Returns:
        total number of documents containing this field, in the range [1 .. maxDoc()]
        See Also:
        Terms.getDocCount()
      • sumTotalTermFreq

        public final long sumTotalTermFreq()
        The total number of tokens for this field. This is the "word count" for this field across all documents. It is the sum of TermStatistics.totalTermFreq() across all terms. It is also the sum of each document's field length across all documents.

        This value is always a positive number, and always at least sumDocFreq().

        Returns:
        total number of tokens in the field, in the range [sumDocFreq() .. Long.MAX_VALUE]
        See Also:
        Terms.getSumTotalTermFreq()
      • sumDocFreq

        public final long sumDocFreq()
        The total number of posting list entries for this field. This is the sum of term-document pairs: the sum of TermStatistics.docFreq() across all terms. It is also the sum of each document's unique term count for this field across all documents.

        This value is always a positive number, always at least docCount(), and never exceeds sumTotalTermFreq().

        Returns:
        number of posting list entries, in the range [docCount() .. sumTotalTermFreq()]
        See Also:
        Terms.getSumDocFreq()