Class ClassicSimilarity
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.apache.lucene.search.similarities.TFIDFSimilarity
-
- org.apache.lucene.search.similarities.ClassicSimilarity
-
public class ClassicSimilarity extends TFIDFSimilarity
Expert: Historical scoring implementation. You might want to consider usingBM25Similarityinstead, which is generally considered superior to TF-IDF.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer, Similarity.SimWeight
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
discountOverlaps
-
-
Constructor Summary
Constructors Constructor Description ClassicSimilarity()Sole constructor: parameter-free
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description floatidf(long docFreq, long docCount)Implemented aslog((docCount+1)/(docFreq+1)) + 1.ExplanationidfExplain(CollectionStatistics collectionStats, TermStatistics termStats)Computes a score factor for a simple term and returns an explanation for that score factor.floatlengthNorm(int numTerms)Implemented as1/sqrt(length).floatscorePayload(int doc, int start, int end, BytesRef payload)The default implementation returns1floatsloppyFreq(int distance)Implemented as1 / (distance + 1).floattf(float freq)Implemented assqrt(freq).StringtoString()-
Methods inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
computeNorm, computeWeight, getDiscountOverlaps, idfExplain, setDiscountOverlaps, simScorer
-
-
-
-
Method Detail
-
lengthNorm
public float lengthNorm(int numTerms)
Implemented as1/sqrt(length).- Specified by:
lengthNormin classTFIDFSimilarity- Parameters:
numTerms- the number of terms in the field, optionallydiscounting overlaps- Returns:
- a length normalization value
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
tf
public float tf(float freq)
Implemented assqrt(freq).- Specified by:
tfin classTFIDFSimilarity- Parameters:
freq- the frequency of a term within a document- Returns:
- a score factor based on a term's within-document frequency
-
sloppyFreq
public float sloppyFreq(int distance)
Implemented as1 / (distance + 1).- Specified by:
sloppyFreqin classTFIDFSimilarity- Parameters:
distance- the edit distance of this sloppy phrase match- Returns:
- the frequency increment for this match
- See Also:
PhraseQuery.getSlop()
-
scorePayload
public float scorePayload(int doc, int start, int end, BytesRef payload)The default implementation returns1- Specified by:
scorePayloadin classTFIDFSimilarity- Parameters:
doc- The docId currently being scored.start- The start position of the payloadend- The end position of the payloadpayload- The payload byte array to be scored- Returns:
- An implementation dependent float to be used as a scoring factor
-
idfExplain
public Explanation idfExplain(CollectionStatistics collectionStats, TermStatistics termStats)
Description copied from class:TFIDFSimilarityComputes a score factor for a simple term and returns an explanation for that score factor.The default implementation uses:
idf(docFreq, docCount);
Note thatCollectionStatistics.docCount()is used instead ofIndexReader#numDocs()because alsoTermStatistics.docFreq()is used, and when the latter is inaccurate, so isCollectionStatistics.docCount(), and in the same direction. In addition,CollectionStatistics.docCount()does not skew when fields are sparse.- Overrides:
idfExplainin classTFIDFSimilarity- Parameters:
collectionStats- collection-level statisticstermStats- term-level statistics for the term- Returns:
- an Explain object that includes both an idf score factor and an explanation for the term.
-
idf
public float idf(long docFreq, long docCount)Implemented aslog((docCount+1)/(docFreq+1)) + 1.- Specified by:
idfin classTFIDFSimilarity- Parameters:
docFreq- the number of documents which contain the termdocCount- the total number of documents in the collection- Returns:
- a score factor based on the term's document frequency
-
-