org.apache.lucene.search.postingshighlight
Class PassageScorer

java.lang.Object
  extended by org.apache.lucene.search.postingshighlight.PassageScorer

public class PassageScorer
extends Object

Ranks passages found by PostingsHighlighter.

Each passage is scored as a miniature document within the document. The final score is computed as norm(int) * ∑ (weight(int, int) * tf(int, int)). The default implementation is norm(int) * BM25.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Constructor Summary
PassageScorer()
          Creates PassageScorer with these default values: k1 = 1.2, b = 0.75.
PassageScorer(float k1, float b, float pivot)
          Creates PassageScorer with specified scoring parameters
 
Method Summary
 float norm(int passageStart)
          Normalize a passage according to its position in the document.
 float tf(int freq, int passageLen)
          Computes term weight, given the frequency within the passage and the passage's length.
 float weight(int contentLength, int totalTermFreq)
          Computes term importance, given its in-document statistics.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PassageScorer

public PassageScorer()
Creates PassageScorer with these default values:


PassageScorer

public PassageScorer(float k1,
                     float b,
                     float pivot)
Creates PassageScorer with specified scoring parameters

Parameters:
k1 - Controls non-linear term frequency normalization (saturation).
b - Controls to what degree passage length normalizes tf values.
pivot - Pivot value for length normalization (some rough idea of average sentence length in characters).
Method Detail

weight

public float weight(int contentLength,
                    int totalTermFreq)
Computes term importance, given its in-document statistics.

Parameters:
contentLength - length of document in characters
totalTermFreq - number of time term occurs in document
Returns:
term importance

tf

public float tf(int freq,
                int passageLen)
Computes term weight, given the frequency within the passage and the passage's length.

Parameters:
freq - number of occurrences of within this passage
passageLen - length of the passage in characters.
Returns:
term weight

norm

public float norm(int passageStart)
Normalize a passage according to its position in the document.

Typically passages towards the beginning of the document are more useful for summarizing the contents.

The default implementation is 1 + 1/log(pivot + passageStart)

Parameters:
passageStart - start offset of the passage
Returns:
a boost value multiplied into the passage's core.


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.