Class DFRSimilarity
- java.lang.Object
- 
- org.apache.lucene.search.similarities.Similarity
- 
- org.apache.lucene.search.similarities.SimilarityBase
- 
- org.apache.lucene.search.similarities.DFRSimilarity
 
 
 
- 
 public class DFRSimilarity extends SimilarityBase Implements the divergence from randomness (DFR) framework introduced in Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20, 4 (October 2002), 357-389.The DFR scoring formula is composed of three separate components: the basic model, the aftereffect and an additional normalization component, represented by the classes BasicModel,AfterEffectandNormalization, respectively. The names of these classes were chosen to match the names of their counterparts in the Terrier IR engine.To construct a DFRSimilarity, you must specify the implementations for all three components of DFR: - BasicModel: Basic model of information content:- BasicModelG: Geometric approximation of Bose-Einstein
- BasicModelIn: Inverse document frequency
- BasicModelIne: Inverse expected document frequency [mixture of Poisson and IDF]
- BasicModelIF: Inverse term frequency [approximation of I(ne)]
 
- AfterEffect: First normalization of information gain:- AfterEffectL: Laplace's law of succession
- AfterEffectB: Ratio of two Bernoulli processes
 
- Normalization: Second (length) normalization:- NormalizationH1: Uniform distribution of term frequency
- NormalizationH2: term frequency density inversely related to length
- NormalizationH3: term frequency normalization provided by Dirichlet prior
- NormalizationZ: term frequency normalization provided by a Zipfian relation
- Normalization.NoNormalization: no second normalization
 
 Note that qtf, the multiplicity of term-occurrence in the query, is not handled by this implementation. Note that basic models BE (Limiting form of Bose-Einstein), P (Poisson approximation of the Binomial) and D (Divergence approximation of the Binomial) are not implemented because their formula couldn't be written in a way that makes scores non-decreasing with the normalized term frequency. - See Also:
- BasicModel,- AfterEffect,- Normalization
- WARNING: This API is experimental and might change in incompatible ways in the next release.
 
- 
- 
Nested Class Summary- 
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.SimilaritySimilarity.SimScorer
 
- 
 - 
Field SummaryFields Modifier and Type Field Description protected AfterEffectafterEffectThe first normalization of the information content.protected BasicModelbasicModelThe basic model for information content.protected NormalizationnormalizationThe term frequency normalization.
 - 
Constructor SummaryConstructors Constructor Description DFRSimilarity(BasicModel basicModel, AfterEffect afterEffect, Normalization normalization)Creates DFRSimilarity from the three components and using default discountOverlaps value.DFRSimilarity(BasicModel basicModel, AfterEffect afterEffect, Normalization normalization, boolean discountOverlaps)Creates DFRSimilarity from the three components and with the specified discountOverlaps value.
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidexplain(List<Explanation> subs, BasicStats stats, double freq, double docLen)Subclasses should implement this method to explain the score.protected Explanationexplain(BasicStats stats, Explanation freq, double docLen)Explains the score.AfterEffectgetAfterEffect()Returns the first normalizationBasicModelgetBasicModel()Returns the basic model of information contentNormalizationgetNormalization()Returns the second normalizationprotected doublescore(BasicStats stats, double freq, double docLen)Scores the documentdoc.StringtoString()Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.- 
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBasefillBasicStats, log2, newStats, scorer
 - 
Methods inherited from class org.apache.lucene.search.similarities.SimilaritycomputeNorm, getDiscountOverlaps
 
- 
 
- 
- 
- 
Field Detail- 
basicModelprotected final BasicModel basicModel The basic model for information content.
 - 
afterEffectprotected final AfterEffect afterEffect The first normalization of the information content.
 - 
normalizationprotected final Normalization normalization The term frequency normalization.
 
- 
 - 
Constructor Detail- 
DFRSimilaritypublic DFRSimilarity(BasicModel basicModel, AfterEffect afterEffect, Normalization normalization) Creates DFRSimilarity from the three components and using default discountOverlaps value.Note that nullvalues are not allowed: if you want no normalization, instead passNormalization.NoNormalization.- Parameters:
- basicModel- Basic model of information content
- afterEffect- First normalization of information gain
- normalization- Second (length) normalization
 
 - 
DFRSimilaritypublic DFRSimilarity(BasicModel basicModel, AfterEffect afterEffect, Normalization normalization, boolean discountOverlaps) Creates DFRSimilarity from the three components and with the specified discountOverlaps value.Note that nullvalues are not allowed: if you want no normalization, instead passNormalization.NoNormalization.- Parameters:
- basicModel- Basic model of information content
- afterEffect- First normalization of information gain
- normalization- Second (length) normalization
- discountOverlaps- True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.
 
 
- 
 - 
Method Detail- 
scoreprotected double score(BasicStats stats, double freq, double docLen) Description copied from class:SimilarityBaseScores the documentdoc.Subclasses must apply their scoring formula in this class. - Specified by:
- scorein class- SimilarityBase
- Parameters:
- stats- the corpus level statistics.
- freq- the term frequency.
- docLen- the document length.
- Returns:
- the score.
 
 - 
explainprotected void explain(List<Explanation> subs, BasicStats stats, double freq, double docLen) Description copied from class:SimilarityBaseSubclasses should implement this method to explain the score.explalready contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing. - Overrides:
- explainin class- SimilarityBase
- Parameters:
- subs- the list of details of the explanation to extend
- stats- the corpus level statistics.
- freq- the term frequency.
- docLen- the document length.
 
 - 
explainprotected Explanation explain(BasicStats stats, Explanation freq, double docLen) Description copied from class:SimilarityBaseExplains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via theSimilarityBase.score(BasicStats, double, double)method) and the explanation for the term frequency. Subclasses content with this format may add additional details inSimilarityBase.explain(List, BasicStats, double, double).- Overrides:
- explainin class- SimilarityBase
- Parameters:
- stats- the corpus level statistics.
- freq- the term frequency and its explanation.
- docLen- the document length.
- Returns:
- the explanation.
 
 - 
toStringpublic String toString() Description copied from class:SimilarityBaseSubclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.- Specified by:
- toStringin class- SimilarityBase
 
 - 
getBasicModelpublic BasicModel getBasicModel() Returns the basic model of information content
 - 
getAfterEffectpublic AfterEffect getAfterEffect() Returns the first normalization
 - 
getNormalizationpublic Normalization getNormalization() Returns the second normalization
 
- 
 
-