Class ClassicSimilarity

  • public class ClassicSimilarity
    extends TFIDFSimilarity
    Expert: Historical scoring implementation. You might want to consider using BM25Similarity instead, which is generally considered superior to TF-IDF.
    • Constructor Detail

      • ClassicSimilarity

        public ClassicSimilarity()
        Sole constructor: parameter-free
    • Method Detail

      • lengthNorm

        public float lengthNorm​(int numTerms)
        Implemented as 1/sqrt(length).
        Specified by:
        lengthNorm in class TFIDFSimilarity
        numTerms - the number of terms in the field, optionally discounting overlaps
        a length normalization value
        WARNING: This API is experimental and might change in incompatible ways in the next release.
      • tf

        public float tf​(float freq)
        Implemented as sqrt(freq).
        Specified by:
        tf in class TFIDFSimilarity
        freq - the frequency of a term within a document
        a score factor based on a term's within-document frequency
      • sloppyFreq

        public float sloppyFreq​(int distance)
        Implemented as 1 / (distance + 1).
        Specified by:
        sloppyFreq in class TFIDFSimilarity
        distance - the edit distance of this sloppy phrase match
        the frequency increment for this match
        See Also:
      • scorePayload

        public float scorePayload​(int doc,
                                  int start,
                                  int end,
                                  BytesRef payload)
        The default implementation returns 1
        Specified by:
        scorePayload in class TFIDFSimilarity
        doc - The docId currently being scored.
        start - The start position of the payload
        end - The end position of the payload
        payload - The payload byte array to be scored
        An implementation dependent float to be used as a scoring factor
      • idf

        public float idf​(long docFreq,
                         long docCount)
        Implemented as log((docCount+1)/(docFreq+1)) + 1.
        Specified by:
        idf in class TFIDFSimilarity
        docFreq - the number of documents which contain the term
        docCount - the total number of documents in the collection
        a score factor based on the term's document frequency