Class PassageScorer

java.lang.Object
org.apache.lucene.search.uhighlight.PassageScorer

public class PassageScorer extends Object
Ranks passages found by UnifiedHighlighter.

Each passage is scored as a miniature document within the document. The final score is computed as norm(int) * ∑ (weight(int, int) * tf(int, int)). The default implementation is norm(int) * BM25.

WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates PassageScorer with these default values: k1 = 1.2, b = 0.75.
    PassageScorer(float k1, float b, float pivot)
    Creates PassageScorer with specified scoring parameters
  • Method Summary

    Modifier and Type
    Method
    Description
    float
    norm(int passageStart)
    Normalize a passage according to its position in the document.
    float
    score(Passage passage, int contentLength)
     
    float
    tf(int freq, int passageLen)
    Computes term weight, given the frequency within the passage and the passage's length.
    float
    weight(int contentLength, int totalTermFreq)
    Computes term importance, given its in-document statistics.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • PassageScorer

      public PassageScorer()
      Creates PassageScorer with these default values:
      • k1 = 1.2,
      • b = 0.75.
      • pivot = 87
    • PassageScorer

      public PassageScorer(float k1, float b, float pivot)
      Creates PassageScorer with specified scoring parameters
      Parameters:
      k1 - Controls non-linear term frequency normalization (saturation).
      b - Controls to what degree passage length normalizes tf values.
      pivot - Pivot value for length normalization (some rough idea of average sentence length in characters).
  • Method Details

    • weight

      public float weight(int contentLength, int totalTermFreq)
      Computes term importance, given its in-document statistics.
      Parameters:
      contentLength - length of document in characters
      totalTermFreq - number of time term occurs in document
      Returns:
      term importance
    • tf

      public float tf(int freq, int passageLen)
      Computes term weight, given the frequency within the passage and the passage's length.
      Parameters:
      freq - number of occurrences of within this passage
      passageLen - length of the passage in characters.
      Returns:
      term weight
    • norm

      public float norm(int passageStart)
      Normalize a passage according to its position in the document.

      Typically passages towards the beginning of the document are more useful for summarizing the contents.

      The default implementation is 1 + 1/log(pivot + passageStart)

      Parameters:
      passageStart - start offset of the passage
      Returns:
      a boost value multiplied into the passage's core.
    • score

      public float score(Passage passage, int contentLength)