Class ClassicSimilarity


  • public class ClassicSimilarity
    extends TFIDFSimilarity
    Expert: Historical scoring implementation. You might want to consider using BM25Similarity instead, which is generally considered superior to TF-IDF.
    • Constructor Detail

      • ClassicSimilarity

        public ClassicSimilarity()
        Sole constructor: parameter-free
    • Method Detail

      • lengthNorm

        public float lengthNorm​(int numTerms)
        Implemented as 1/sqrt(length).
        Specified by:
        lengthNorm in class TFIDFSimilarity
        Parameters:
        numTerms - the number of terms in the field, optionally discounting overlaps
        Returns:
        a length normalization value
        WARNING: This API is experimental and might change in incompatible ways in the next release.
      • tf

        public float tf​(float freq)
        Implemented as sqrt(freq).
        Specified by:
        tf in class TFIDFSimilarity
        Parameters:
        freq - the frequency of a term within a document
        Returns:
        a score factor based on a term's within-document frequency
      • idf

        public float idf​(long docFreq,
                         long docCount)
        Implemented as log((docCount+1)/(docFreq+1)) + 1.
        Specified by:
        idf in class TFIDFSimilarity
        Parameters:
        docFreq - the number of documents which contain the term
        docCount - the total number of documents in the collection
        Returns:
        a score factor based on the term's document frequency