Class CollectionStatistics

java.lang.Object
org.apache.lucene.search.CollectionStatistics

public class CollectionStatistics extends Object
Contains statistics for a collection (field).

This class holds statistics across all documents for scoring purposes:

The following conditions are always true:

  • All statistics are positive integers: never zero or negative.
  • docCount <= maxDoc
  • docCount <= sumDocFreq <= sumTotalTermFreq

Values may include statistics on deleted documents that have not yet been merged away.

Be careful when performing calculations on these values because they are represented as 64-bit integer values, you may need to cast to double for your use.

WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Constructor Summary

    Constructors
    Constructor
    Description
    CollectionStatistics(String field, long maxDoc, long docCount, long sumTotalTermFreq, long sumDocFreq)
    Creates statistics instance for a collection (field).
  • Method Summary

    Modifier and Type
    Method
    Description
    final long
    The total number of documents that have at least one term for this field.
    final String
    The field's name.
    final long
    The total number of documents, regardless of whether they all contain values for this field.
    final long
    The total number of posting list entries for this field.
    final long
    The total number of tokens for this field.
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • CollectionStatistics

      public CollectionStatistics(String field, long maxDoc, long docCount, long sumTotalTermFreq, long sumDocFreq)
      Creates statistics instance for a collection (field).
      Parameters:
      field - Field's name
      maxDoc - total number of documents.
      docCount - number of documents containing the field.
      sumTotalTermFreq - number of tokens in the field.
      sumDocFreq - number of postings list entries for the field.
      Throws:
      IllegalArgumentException - if maxDoc is negative or zero.
      IllegalArgumentException - if docCount is negative or zero.
      IllegalArgumentException - if docCount is more than maxDoc.
      IllegalArgumentException - if sumDocFreq is less than docCount.
      IllegalArgumentException - if sumTotalTermFreq is less than sumDocFreq.
  • Method Details

    • field

      public final String field()
      The field's name.

      This value is never null.

      Returns:
      field's name, not null
    • maxDoc

      public final long maxDoc()
      The total number of documents, regardless of whether they all contain values for this field.

      This value is always a positive number.

      Returns:
      total number of documents, in the range [1 .. Long.MAX_VALUE]
      See Also:
    • docCount

      public final long docCount()
      The total number of documents that have at least one term for this field.

      This value is always a positive number, and never exceeds maxDoc().

      Returns:
      total number of documents containing this field, in the range [1 .. maxDoc()]
      See Also:
    • sumTotalTermFreq

      public final long sumTotalTermFreq()
      The total number of tokens for this field. This is the "word count" for this field across all documents. It is the sum of TermStatistics.totalTermFreq() across all terms. It is also the sum of each document's field length across all documents.

      This value is always a positive number, and always at least sumDocFreq().

      Returns:
      total number of tokens in the field, in the range [sumDocFreq() .. Long.MAX_VALUE]
      See Also:
    • sumDocFreq

      public final long sumDocFreq()
      The total number of posting list entries for this field. This is the sum of term-document pairs: the sum of TermStatistics.docFreq() across all terms. It is also the sum of each document's unique term count for this field across all documents.

      This value is always a positive number, always at least docCount(), and never exceeds sumTotalTermFreq().

      Returns:
      number of posting list entries, in the range [docCount() .. sumTotalTermFreq()]
      See Also:
    • toString

      public String toString()
      Overrides:
      toString in class Object