public class PassageScorer extends Object
UnifiedHighlighter
.
Each passage is scored as a miniature document within the document.
The final score is computed as norm(int)
* ∑ (weight(int, int)
* tf(int, int)
).
The default implementation is norm(int)
* BM25.
Constructor and Description |
---|
PassageScorer()
Creates PassageScorer with these default values:
k1 = 1.2 ,
b = 0.75 . |
PassageScorer(float k1,
float b,
float pivot)
Creates PassageScorer with specified scoring parameters
|
Modifier and Type | Method and Description |
---|---|
float |
norm(int passageStart)
Normalize a passage according to its position in the document.
|
float |
score(Passage passage,
int contentLength) |
float |
tf(int freq,
int passageLen)
Computes term weight, given the frequency within the passage
and the passage's length.
|
float |
weight(int contentLength,
int totalTermFreq)
Computes term importance, given its in-document statistics.
|
public PassageScorer()
k1 = 1.2
,
b = 0.75
.
pivot = 87
public PassageScorer(float k1, float b, float pivot)
k1
- Controls non-linear term frequency normalization (saturation).b
- Controls to what degree passage length normalizes tf values.pivot
- Pivot value for length normalization (some rough idea of average sentence length in characters).public float weight(int contentLength, int totalTermFreq)
contentLength
- length of document in characterstotalTermFreq
- number of time term occurs in documentpublic float tf(int freq, int passageLen)
freq
- number of occurrences of within this passagepassageLen
- length of the passage in characters.public float norm(int passageStart)
Typically passages towards the beginning of the document are more useful for summarizing the contents.
The default implementation is 1 + 1/log(pivot + passageStart)
passageStart
- start offset of the passagepublic float score(Passage passage, int contentLength)
Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.