public class IBSimilarity extends SimilarityBase
The retrieval function is of the form RSV(q, d) = ∑ -xqw log Prob(Xw ≥ tdw | λw), where
The framework described in the paper has many similarities to the DFR
framework (see DFRSimilarity). It is possible that the two
Similarities will be merged at one point.
Similarity.ExactSimScorer, Similarity.SimWeight, Similarity.SloppySimScorer| Modifier and Type | Field and Description |
|---|---|
protected Distribution |
distribution
The probabilistic distribution used to model term occurrence.
|
protected Lambda |
lambda
The lambda (λw) parameter.
|
protected Normalization |
normalization
The term frequency normalization.
|
discountOverlaps| Constructor and Description |
|---|
IBSimilarity(Distribution distribution,
Lambda lambda,
Normalization normalization) |
| Modifier and Type | Method and Description |
|---|---|
protected void |
explain(Explanation expl,
BasicStats stats,
int doc,
float freq,
float docLen)
Subclasses should implement this method to explain the score.
|
Distribution |
getDistribution() |
Lambda |
getLambda() |
Normalization |
getNormalization() |
protected float |
score(BasicStats stats,
float freq,
float docLen)
Scores the document
doc. |
String |
toString()
The name of IB methods follow the pattern
IB <distribution> <lambda><normalization>. |
computeNorm, computeWeight, decodeNormValue, encodeNormValue, exactSimScorer, explain, fillBasicStats, getDiscountOverlaps, log2, newStats, setDiscountOverlaps, sloppySimScorercoord, queryNormprotected final Distribution distribution
protected final Lambda lambda
protected final Normalization normalization
public IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization)
protected float score(BasicStats stats, float freq, float docLen)
SimilarityBasedoc.
Subclasses must apply their scoring formula in this class.
score in class SimilarityBasestats - the corpus level statistics.freq - the term frequency.docLen - the document length.protected void explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
SimilarityBaseexpl
already contains the score, the name of the class and the doc id, as well
as the term frequency and its explanation; subclasses can add additional
clauses to explain details of their scoring formulae.
The default implementation does nothing.
explain in class SimilarityBaseexpl - the explanation to extend with details.stats - the corpus level statistics.doc - the document id.freq - the term frequency.docLen - the document length.public String toString()
IB <distribution> <lambda><normalization>. The name of the
distribution is the same as in the original paper; for the names of lambda
parameters, refer to the javadoc of the Lambda classes.toString in class SimilarityBasepublic Distribution getDistribution()
public Lambda getLambda()
public Normalization getNormalization()
Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.