public abstract class Axiomatic extends SimilarityBase
There are a family of models. All of them are based on BM25, Pivoted Document Length Normalization and Language model with Dirichlet prior. Some components (e.g. Term Frequency, Inverted Document Frequency) in the original models are modified so that they follow some axiomatic constraints.
Similarity.SimScorer
Modifier and Type | Field and Description |
---|---|
protected float |
k
hyperparam for the primitive weighthing function
|
protected int |
queryLen
the query length
|
protected float |
s
hyperparam for the growth function
|
discountOverlaps
Constructor and Description |
---|
Axiomatic()
Default constructor
|
Axiomatic(float s)
Constructor setting only s, letting k and queryLen to default
|
Axiomatic(float s,
int queryLen)
Constructor setting s and queryLen, letting k to default
|
Axiomatic(float s,
int queryLen,
float k)
Constructor setting all Axiomatic hyperparameters
|
Modifier and Type | Method and Description |
---|---|
protected Explanation |
explain(BasicStats stats,
Explanation freq,
double docLen)
Explains the score.
|
protected void |
explain(List<Explanation> subs,
BasicStats stats,
double freq,
double docLen)
Subclasses should implement this method to explain the score.
|
protected abstract double |
gamma(BasicStats stats,
double freq,
double docLen)
compute the gamma component (only for F3EXp and F3LOG)
|
protected abstract double |
idf(BasicStats stats,
double freq,
double docLen)
compute the inverted document frequency component
|
protected abstract Explanation |
idfExplain(BasicStats stats,
double freq,
double docLen)
Explain the score of the inverted document frequency component
for a single document
|
protected abstract double |
ln(BasicStats stats,
double freq,
double docLen)
compute the document length component
|
protected abstract Explanation |
lnExplain(BasicStats stats,
double freq,
double docLen)
Explain the score of the document length component for a single document
|
double |
score(BasicStats stats,
double freq,
double docLen)
Scores the document
doc . |
protected abstract double |
tf(BasicStats stats,
double freq,
double docLen)
compute the term frequency component
|
protected abstract Explanation |
tfExplain(BasicStats stats,
double freq,
double docLen)
Explain the score of the term frequency component for a single document
|
protected abstract double |
tfln(BasicStats stats,
double freq,
double docLen)
compute the mixed term frequency and document length component
|
protected abstract Explanation |
tflnExplain(BasicStats stats,
double freq,
double docLen)
Explain the score of the mixed term frequency and
document length component for a single document
|
abstract String |
toString()
Name of the axiomatic method.
|
computeNorm, fillBasicStats, getDiscountOverlaps, log2, newStats, scorer, setDiscountOverlaps
protected final float s
protected final float k
protected final int queryLen
public Axiomatic(float s, int queryLen, float k)
s
- hyperparam for the growth functionqueryLen
- the query lengthk
- hyperparam for the primitive weighting functionpublic Axiomatic(float s)
s
- hyperparam for the growth functionpublic Axiomatic(float s, int queryLen)
s
- hyperparam for the growth functionqueryLen
- the query lengthpublic Axiomatic()
public double score(BasicStats stats, double freq, double docLen)
SimilarityBase
doc
.
Subclasses must apply their scoring formula in this class.
score
in class SimilarityBase
stats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.protected Explanation explain(BasicStats stats, Explanation freq, double docLen)
SimilarityBase
SimilarityBase.score(BasicStats, double, double)
method) and the explanation for the term frequency. Subclasses content with
this format may add additional details in
SimilarityBase.explain(List, BasicStats, double, double)
.explain
in class SimilarityBase
stats
- the corpus level statistics.freq
- the term frequency and its explanation.docLen
- the document length.protected void explain(List<Explanation> subs, BasicStats stats, double freq, double docLen)
SimilarityBase
expl
already contains the score, the name of the class and the doc id, as well
as the term frequency and its explanation; subclasses can add additional
clauses to explain details of their scoring formulae.
The default implementation does nothing.
explain
in class SimilarityBase
subs
- the list of details of the explanation to extendstats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.public abstract String toString()
toString
in class SimilarityBase
protected abstract double tf(BasicStats stats, double freq, double docLen)
protected abstract double ln(BasicStats stats, double freq, double docLen)
protected abstract double tfln(BasicStats stats, double freq, double docLen)
protected abstract double idf(BasicStats stats, double freq, double docLen)
protected abstract double gamma(BasicStats stats, double freq, double docLen)
protected abstract Explanation tfExplain(BasicStats stats, double freq, double docLen)
stats
- the corpus level statisticsfreq
- number of occurrences of term in the documentdocLen
- the document lengthprotected abstract Explanation lnExplain(BasicStats stats, double freq, double docLen)
stats
- the corpus level statisticsfreq
- number of occurrences of term in the documentdocLen
- the document lengthprotected abstract Explanation tflnExplain(BasicStats stats, double freq, double docLen)
stats
- the corpus level statisticsfreq
- number of occurrences of term in the documentdocLen
- the document lengthprotected abstract Explanation idfExplain(BasicStats stats, double freq, double docLen)
stats
- the corpus level statisticsfreq
- number of occurrences of term in the documentdocLen
- the document lengthCopyright © 2000-2019 Apache Software Foundation. All Rights Reserved.