public class PassageScorer extends Object
UnifiedHighlighter.
Each passage is scored as a miniature document within the document.
The final score is computed as norm(int) * ∑ (weight(int, int) * tf(int, int)).
The default implementation is norm(int) * BM25.
| Constructor and Description |
|---|
PassageScorer()
Creates PassageScorer with these default values:
k1 = 1.2,
b = 0.75. |
PassageScorer(float k1,
float b,
float pivot)
Creates PassageScorer with specified scoring parameters
|
| Modifier and Type | Method and Description |
|---|---|
float |
norm(int passageStart)
Normalize a passage according to its position in the document.
|
float |
tf(int freq,
int passageLen)
Computes term weight, given the frequency within the passage
and the passage's length.
|
float |
weight(int contentLength,
int totalTermFreq)
Computes term importance, given its in-document statistics.
|
public PassageScorer()
k1 = 1.2,
b = 0.75.
pivot = 87
public PassageScorer(float k1,
float b,
float pivot)
k1 - Controls non-linear term frequency normalization (saturation).b - Controls to what degree passage length normalizes tf values.pivot - Pivot value for length normalization (some rough idea of average sentence length in characters).public float weight(int contentLength,
int totalTermFreq)
contentLength - length of document in characterstotalTermFreq - number of time term occurs in documentpublic float tf(int freq,
int passageLen)
freq - number of occurrences of within this passagepassageLen - length of the passage in characters.public float norm(int passageStart)
Typically passages towards the beginning of the document are more useful for summarizing the contents.
The default implementation is 1 + 1/log(pivot + passageStart)
passageStart - start offset of the passageCopyright © 2000-2017 Apache Software Foundation. All Rights Reserved.