org.apache.lucene.search.postingshighlight
Class PassageScorer

java.lang.Object
  extended by org.apache.lucene.search.postingshighlight.PassageScorer

public class PassageScorer
extends Object

Ranks passages found by PostingsHighlighter.

Each passage is scored as a miniature document within the document. The final score is computed as norm(int) * ∑ (weight(int, int) * tf(int, int)). The default implementation is norm(int) * BM25.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary
static float b
          BM25 b parameter, controls length normalization.
static float k1
          BM25 k1 parameter, controls term frequency normalization
static float pivot
          A pivot used for length normalization.
 
Constructor Summary
PassageScorer()
           
 
Method Summary
 float norm(int passageStart)
          Normalize a passage according to its position in the document.
 float tf(int freq, int passageLen)
          Computes term weight, given the frequency within the passage and the passage's length.
 float weight(int contentLength, int totalTermFreq)
          Computes term importance, given its in-document statistics.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

k1

public static final float k1
BM25 k1 parameter, controls term frequency normalization

See Also:
Constant Field Values

b

public static final float b
BM25 b parameter, controls length normalization.

See Also:
Constant Field Values

pivot

public static final float pivot
A pivot used for length normalization. The default value is the typical average English sentence length.

See Also:
Constant Field Values
Constructor Detail

PassageScorer

public PassageScorer()
Method Detail

weight

public float weight(int contentLength,
                    int totalTermFreq)
Computes term importance, given its in-document statistics.

Parameters:
contentLength - length of document in characters
totalTermFreq - number of time term occurs in document
Returns:
term importance

tf

public float tf(int freq,
                int passageLen)
Computes term weight, given the frequency within the passage and the passage's length.

Parameters:
freq - number of occurrences of within this passage
passageLen - length of the passage in characters.
Returns:
term weight

norm

public float norm(int passageStart)
Normalize a passage according to its position in the document.

Typically passages towards the beginning of the document are more useful for summarizing the contents.

The default implementation is 1 + 1/log(pivot + passageStart)

Parameters:
passageStart - start offset of the passage
Returns:
a boost value multiplied into the passage's core.


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.