Package org.apache.lucene.search
Class CollectionStatistics
- java.lang.Object
-
- org.apache.lucene.search.CollectionStatistics
-
public class CollectionStatistics extends Object
Contains statistics for a collection (field).This class holds statistics across all documents for scoring purposes:
maxDoc()
: number of documents.docCount()
: number of documents that contain this field.sumDocFreq()
: number of postings-list entries.sumTotalTermFreq()
: number of tokens.
The following conditions are always true:
- All statistics are positive integers: never zero or negative.
docCount
<=maxDoc
docCount
<=sumDocFreq
<=sumTotalTermFreq
Values may include statistics on deleted documents that have not yet been merged away.
Be careful when performing calculations on these values because they are represented as 64-bit integer values, you may need to cast to
double
for your use.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Constructor Summary
Constructors Constructor Description CollectionStatistics(String field, long maxDoc, long docCount, long sumTotalTermFreq, long sumDocFreq)
Creates statistics instance for a collection (field).
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description long
docCount()
The total number of documents that have at least one term for this field.String
field()
The field's name.long
maxDoc()
The total number of documents, regardless of whether they all contain values for this field.long
sumDocFreq()
The total number of posting list entries for this field.long
sumTotalTermFreq()
The total number of tokens for this field.String
toString()
-
-
-
Constructor Detail
-
CollectionStatistics
public CollectionStatistics(String field, long maxDoc, long docCount, long sumTotalTermFreq, long sumDocFreq)
Creates statistics instance for a collection (field).- Parameters:
field
- Field's namemaxDoc
- total number of documents.docCount
- number of documents containing the field.sumTotalTermFreq
- number of tokens in the field.sumDocFreq
- number of postings list entries for the field.- Throws:
IllegalArgumentException
- ifmaxDoc
is negative or zero.IllegalArgumentException
- ifdocCount
is negative or zero.IllegalArgumentException
- ifdocCount
is more thanmaxDoc
.IllegalArgumentException
- ifsumDocFreq
is less thandocCount
.IllegalArgumentException
- ifsumTotalTermFreq
is less thansumDocFreq
.
-
-
Method Detail
-
field
public final String field()
The field's name.This value is never
null
.- Returns:
- field's name, not
null
-
maxDoc
public final long maxDoc()
The total number of documents, regardless of whether they all contain values for this field.This value is always a positive number.
- Returns:
- total number of documents, in the range [1 ..
Long.MAX_VALUE
] - See Also:
IndexReader.maxDoc()
-
docCount
public final long docCount()
The total number of documents that have at least one term for this field.This value is always a positive number, and never exceeds
maxDoc()
.- Returns:
- total number of documents containing this field, in the range [1 ..
maxDoc()
] - See Also:
Terms.getDocCount()
-
sumTotalTermFreq
public final long sumTotalTermFreq()
The total number of tokens for this field. This is the "word count" for this field across all documents. It is the sum ofTermStatistics.totalTermFreq()
across all terms. It is also the sum of each document's field length across all documents.This value is always a positive number, and always at least
sumDocFreq()
.- Returns:
- total number of tokens in the field, in the range [
sumDocFreq()
..Long.MAX_VALUE
] - See Also:
Terms.getSumTotalTermFreq()
-
sumDocFreq
public final long sumDocFreq()
The total number of posting list entries for this field. This is the sum of term-document pairs: the sum ofTermStatistics.docFreq()
across all terms. It is also the sum of each document's unique term count for this field across all documents.This value is always a positive number, always at least
docCount()
, and never exceedssumTotalTermFreq()
.- Returns:
- number of posting list entries, in the range [
docCount()
..sumTotalTermFreq()
] - See Also:
Terms.getSumDocFreq()
-
-