Class ClassicSimilarity
java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.TFIDFSimilarity
org.apache.lucene.search.similarities.ClassicSimilarity
Expert: Historical scoring implementation. You might want to consider using
BM25Similarity
instead, which is generally considered superior to TF-IDF.-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
-
Field Summary
Fields inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
discountOverlaps
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionfloat
idf
(long docFreq, long docCount) Implemented aslog((docCount+1)/(docFreq+1)) + 1
.idfExplain
(CollectionStatistics collectionStats, TermStatistics termStats) Computes a score factor for a simple term and returns an explanation for that score factor.float
lengthNorm
(int numTerms) Implemented as1/sqrt(length)
.float
tf
(float freq) Implemented assqrt(freq)
.toString()
Methods inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
computeNorm, getDiscountOverlaps, idfExplain, scorer, setDiscountOverlaps
-
Constructor Details
-
ClassicSimilarity
public ClassicSimilarity()Sole constructor: parameter-free
-
-
Method Details
-
lengthNorm
public float lengthNorm(int numTerms) Implemented as1/sqrt(length)
.- Specified by:
lengthNorm
in classTFIDFSimilarity
- Parameters:
numTerms
- the number of terms in the field, optionallydiscounting overlaps
- Returns:
- a length normalization value
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
tf
public float tf(float freq) Implemented assqrt(freq)
.- Specified by:
tf
in classTFIDFSimilarity
- Parameters:
freq
- the frequency of a term within a document- Returns:
- a score factor based on a term's within-document frequency
-
idfExplain
Description copied from class:TFIDFSimilarity
Computes a score factor for a simple term and returns an explanation for that score factor.The default implementation uses:
idf(docFreq, docCount);
Note thatCollectionStatistics.docCount()
is used instead ofIndexReader#numDocs()
because alsoTermStatistics.docFreq()
is used, and when the latter is inaccurate, so isCollectionStatistics.docCount()
, and in the same direction. In addition,CollectionStatistics.docCount()
does not skew when fields are sparse.- Overrides:
idfExplain
in classTFIDFSimilarity
- Parameters:
collectionStats
- collection-level statisticstermStats
- term-level statistics for the term- Returns:
- an Explain object that includes both an idf score factor and an explanation for the term.
-
idf
public float idf(long docFreq, long docCount) Implemented aslog((docCount+1)/(docFreq+1)) + 1
.- Specified by:
idf
in classTFIDFSimilarity
- Parameters:
docFreq
- the number of documents which contain the termdocCount
- the total number of documents in the collection- Returns:
- a score factor based on the term's document frequency
-
toString
-