Class ClassicSimilarity
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.apache.lucene.search.similarities.TFIDFSimilarity
-
- org.apache.lucene.search.similarities.ClassicSimilarity
-
public class ClassicSimilarity extends TFIDFSimilarity
Expert: Historical scoring implementation. You might want to consider usingBM25Similarity
instead, which is generally considered superior to TF-IDF.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
discountOverlaps
-
-
Constructor Summary
Constructors Constructor Description ClassicSimilarity()
Sole constructor: parameter-free
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description float
idf(long docFreq, long docCount)
Implemented aslog((docCount+1)/(docFreq+1)) + 1
.Explanation
idfExplain(CollectionStatistics collectionStats, TermStatistics termStats)
Computes a score factor for a simple term and returns an explanation for that score factor.float
lengthNorm(int numTerms)
Implemented as1/sqrt(length)
.float
tf(float freq)
Implemented assqrt(freq)
.String
toString()
-
Methods inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
computeNorm, getDiscountOverlaps, idfExplain, scorer, setDiscountOverlaps
-
-
-
-
Method Detail
-
lengthNorm
public float lengthNorm(int numTerms)
Implemented as1/sqrt(length)
.- Specified by:
lengthNorm
in classTFIDFSimilarity
- Parameters:
numTerms
- the number of terms in the field, optionallydiscounting overlaps
- Returns:
- a length normalization value
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
tf
public float tf(float freq)
Implemented assqrt(freq)
.- Specified by:
tf
in classTFIDFSimilarity
- Parameters:
freq
- the frequency of a term within a document- Returns:
- a score factor based on a term's within-document frequency
-
idfExplain
public Explanation idfExplain(CollectionStatistics collectionStats, TermStatistics termStats)
Description copied from class:TFIDFSimilarity
Computes a score factor for a simple term and returns an explanation for that score factor.The default implementation uses:
idf(docFreq, docCount);
Note thatCollectionStatistics.docCount()
is used instead ofIndexReader#numDocs()
because alsoTermStatistics.docFreq()
is used, and when the latter is inaccurate, so isCollectionStatistics.docCount()
, and in the same direction. In addition,CollectionStatistics.docCount()
does not skew when fields are sparse.- Overrides:
idfExplain
in classTFIDFSimilarity
- Parameters:
collectionStats
- collection-level statisticstermStats
- term-level statistics for the term- Returns:
- an Explain object that includes both an idf score factor and an explanation for the term.
-
idf
public float idf(long docFreq, long docCount)
Implemented aslog((docCount+1)/(docFreq+1)) + 1
.- Specified by:
idf
in classTFIDFSimilarity
- Parameters:
docFreq
- the number of documents which contain the termdocCount
- the total number of documents in the collection- Returns:
- a score factor based on the term's document frequency
-
-