public class DFRSimilarity extends SimilarityBaseImplements the divergence from randomness (DFR) framework introduced in Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20, 4 (October 2002), 357-389.
The DFR scoring formula is composed of three separate components: the basic model, the aftereffect and an additional normalization component, represented by the classes
Normalization, respectively. The names of these classes were chosen to match the names of their counterparts in the Terrier IR engine.
To construct a DFRSimilarity, you must specify the implementations for all three components of DFR:
BasicModel: Basic model of information content:
AfterEffect: First normalization of information gain:
Normalization: Second (length) normalization:
NormalizationH1: Uniform distribution of term frequency
NormalizationH2: term frequency density inversely related to length
NormalizationH3: term frequency normalization provided by Dirichlet prior
NormalizationZ: term frequency normalization provided by a Zipfian relation
Normalization.NoNormalization: no second normalization
Note that qtf, the multiplicity of term-occurrence in the query, is not handled by this implementation.
Note that basic models BE (Limiting form of Bose-Einstein), P (Poisson approximation of the Binomial) and D (Divergence approximation of the Binomial) are not implemented because their formula couldn't be written in a way that makes scores non-decreasing with the normalized term frequency.
Fields Modifier and Type Field Description
afterEffectThe first normalization of the information content.
basicModelThe basic model for information content.
normalizationThe term frequency normalization.
All Methods Instance Methods Concrete Methods Modifier and Type Method Description
explain(List<Explanation> subs, BasicStats stats, double freq, double docLen)Subclasses should implement this method to explain the score.
explain(BasicStats stats, Explanation freq, double docLen)Explains the score.
getAfterEffect()Returns the first normalization
getBasicModel()Returns the basic model of information content
getNormalization()Returns the second normalization
score(BasicStats stats, double freq, double docLen)Scores the document
toString()Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
computeNorm, fillBasicStats, getDiscountOverlaps, log2, newStats, scorer, setDiscountOverlaps
public DFRSimilarity(BasicModel basicModel, AfterEffect afterEffect, Normalization normalization)Creates DFRSimilarity from the three components.
nullvalues are not allowed: if you want no normalization, instead pass
basicModel- Basic model of information content
afterEffect- First normalization of information gain
normalization- Second (length) normalization
protected double score(BasicStats stats, double freq, double docLen)Scores the document
Subclasses must apply their scoring formula in this class.
protected void explain(List<Explanation> subs, BasicStats stats, double freq, double docLen)Subclasses should implement this method to explain the score.
explalready contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.
The default implementation does nothing.
protected Explanation explain(BasicStats stats, Explanation freq, double docLen)Explains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via the
SimilarityBase.score(BasicStats, double, double)method) and the explanation for the term frequency. Subclasses content with this format may add additional details in
SimilarityBase.explain(List, BasicStats, double, double).
public String toString()Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.
public BasicModel getBasicModel()Returns the basic model of information content
public AfterEffect getAfterEffect()Returns the first normalization
public Normalization getNormalization()Returns the second normalization