public class IBSimilarity extends SimilarityBaseProvides a framework for the family of information-based models, as described in Stéphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10). ACM, New York, NY, USA, 234-241.
The retrieval function is of the form RSV(q, d) = ∑ -xqw log Prob(Xw ≥ tdw | λw), where
- xqw is the query boost;
- Xw is a random variable that counts the occurrences of word w;
- tdw is the normalized term frequency;
- λw is a parameter.
The framework described in the paper has many similarities to the DFR framework (see
DFRSimilarity). It is possible that the two Similarities will be merged at one point.
To construct an IBSimilarity, you must specify the implementations for all three components of the Information-Based model.
Distribution: Probabilistic distribution used to model term occurrence
Lambda: λw parameter of the probability distribution
Normalization: Term frequency normalization
Any supported DFR normalization (listed in
- See Also:
- WARNING: This API is experimental and might change in incompatible ways in the next release.
Fields Modifier and Type Field Description
distributionThe probabilistic distribution used to model term occurrence.
lambdaThe lambda (λw) parameter.
normalizationThe term frequency normalization.
All Methods Instance Methods Concrete Methods Modifier and Type Method Description
explain(List<Explanation> subs, BasicStats stats, double freq, double docLen)Subclasses should implement this method to explain the score.
explain(BasicStats stats, Explanation freq, double docLen)Explains the score.
getDistribution()Returns the distribution
getLambda()Returns the distribution's lambda parameter
getNormalization()Returns the term frequency normalization
score(BasicStats stats, double freq, double docLen)Scores the document
toString()The name of IB methods follow the pattern
IB <distribution> <lambda><normalization>.
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
computeNorm, fillBasicStats, getDiscountOverlaps, log2, newStats, scorer, setDiscountOverlaps
public IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization)Creates IBSimilarity from the three components.
nullvalues are not allowed: if you want no normalization, instead pass
distribution- probabilistic distribution modeling term occurrence
lambda- distribution's λw parameter
normalization- term frequency normalization
protected double score(BasicStats stats, double freq, double docLen)Scores the document
Subclasses must apply their scoring formula in this class.
protected void explain(List<Explanation> subs, BasicStats stats, double freq, double docLen)Subclasses should implement this method to explain the score.
explalready contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.
The default implementation does nothing.
protected Explanation explain(BasicStats stats, Explanation freq, double docLen)Explains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via the
SimilarityBase.score(BasicStats, double, double)method) and the explanation for the term frequency. Subclasses content with this format may add additional details in
SimilarityBase.explain(List, BasicStats, double, double).
public String toString()The name of IB methods follow the pattern
IB <distribution> <lambda><normalization>. The name of the distribution is the same as in the original paper; for the names of lambda parameters, refer to the javadoc of the
public Distribution getDistribution()Returns the distribution
public Lambda getLambda()Returns the distribution's lambda parameter
public Normalization getNormalization()Returns the term frequency normalization