public class IBSimilarity extends SimilarityBase
The retrieval function is of the form RSV(q, d) = ∑ -xqw log Prob(Xw ≥ tdw | λw), where
The framework described in the paper has many similarities to the DFR
framework (see DFRSimilarity
). It is possible that the two
Similarities will be merged at one point.
To construct an IBSimilarity, you must specify the implementations for all three components of the Information-Based model.
Distribution
: Probabilistic distribution used to
model term occurrence
DistributionLL
: Log-logisticDistributionLL
: Smoothed power-lawLambda
: λw parameter of the
probability distribution
Normalization
: Term frequency normalization
Any supported DFR normalization (listed in
DFRSimilarity
)
DFRSimilarity
Similarity.SimScorer, Similarity.SimWeight
Modifier and Type | Field and Description |
---|---|
protected Distribution |
distribution
The probabilistic distribution used to model term occurrence.
|
protected Lambda |
lambda
The lambda (λw) parameter.
|
protected Normalization |
normalization
The term frequency normalization.
|
discountOverlaps
Constructor and Description |
---|
IBSimilarity(Distribution distribution,
Lambda lambda,
Normalization normalization)
Creates IBSimilarity from the three components.
|
Modifier and Type | Method and Description |
---|---|
protected void |
explain(Explanation expl,
BasicStats stats,
int doc,
float freq,
float docLen)
Subclasses should implement this method to explain the score.
|
Distribution |
getDistribution()
Returns the distribution
|
Lambda |
getLambda()
Returns the distribution's lambda parameter
|
Normalization |
getNormalization()
Returns the term frequency normalization
|
protected float |
score(BasicStats stats,
float freq,
float docLen)
Scores the document
doc . |
String |
toString()
The name of IB methods follow the pattern
IB <distribution> <lambda><normalization> . |
computeNorm, computeWeight, decodeNormValue, encodeNormValue, explain, fillBasicStats, getDiscountOverlaps, log2, newStats, setDiscountOverlaps, simScorer
coord, queryNorm
protected final Distribution distribution
protected final Lambda lambda
protected final Normalization normalization
public IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization)
Note that null
values are not allowed:
if you want no normalization, instead pass
Normalization.NoNormalization
.
distribution
- probabilistic distribution modeling term occurrencelambda
- distribution's λw parameternormalization
- term frequency normalizationprotected float score(BasicStats stats, float freq, float docLen)
SimilarityBase
doc
.
Subclasses must apply their scoring formula in this class.
score
in class SimilarityBase
stats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.protected void explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
SimilarityBase
expl
already contains the score, the name of the class and the doc id, as well
as the term frequency and its explanation; subclasses can add additional
clauses to explain details of their scoring formulae.
The default implementation does nothing.
explain
in class SimilarityBase
expl
- the explanation to extend with details.stats
- the corpus level statistics.doc
- the document id.freq
- the term frequency.docLen
- the document length.public String toString()
IB <distribution> <lambda><normalization>
. The name of the
distribution is the same as in the original paper; for the names of lambda
parameters, refer to the javadoc of the Lambda
classes.toString
in class SimilarityBase
public Distribution getDistribution()
public Lambda getLambda()
public Normalization getNormalization()
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.