Class IBSimilarity
- java.lang.Object
- 
- org.apache.lucene.search.similarities.Similarity
- 
- org.apache.lucene.search.similarities.SimilarityBase
- 
- org.apache.lucene.search.similarities.IBSimilarity
 
 
 
- 
 public class IBSimilarity extends SimilarityBase Provides a framework for the family of information-based models, as described in Stéphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10). ACM, New York, NY, USA, 234-241.The retrieval function is of the form RSV(q, d) = ∑ -xqw log Prob(Xw ≥ tdw | λw), where - xqw is the query boost;
- Xw is a random variable that counts the occurrences of word w;
- tdw is the normalized term frequency;
- λw is a parameter.
 The framework described in the paper has many similarities to the DFR framework (see DFRSimilarity). It is possible that the two Similarities will be merged at one point.To construct an IBSimilarity, you must specify the implementations for all three components of the Information-Based model. - Distribution: Probabilistic distribution used to model term occurrence- DistributionLL: Log-logistic
- DistributionLL: Smoothed power-law
 
- Lambda: λw parameter of the probability distribution
- Normalization: Term frequency normalization- Any supported DFR normalization (listed in - DFRSimilarity)
 - See Also:
- DFRSimilarity
- WARNING: This API is experimental and might change in incompatible ways in the next release.
 
- 
- 
Nested Class Summary- 
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.SimilaritySimilarity.SimScorer
 
- 
 - 
Field SummaryFields Modifier and Type Field Description protected DistributiondistributionThe probabilistic distribution used to model term occurrence.protected LambdalambdaThe lambda (λw) parameter.protected NormalizationnormalizationThe term frequency normalization.
 - 
Constructor SummaryConstructors Constructor Description IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization)Creates IBSimilarity from the three components and using default discountOverlaps value.IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization, boolean discountOverlaps)Creates IBSimilarity from the three components and with the specified discountOverlaps value.
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidexplain(List<Explanation> subs, BasicStats stats, double freq, double docLen)Subclasses should implement this method to explain the score.protected Explanationexplain(BasicStats stats, Explanation freq, double docLen)Explains the score.DistributiongetDistribution()Returns the distributionLambdagetLambda()Returns the distribution's lambda parameterNormalizationgetNormalization()Returns the term frequency normalizationprotected doublescore(BasicStats stats, double freq, double docLen)Scores the documentdoc.StringtoString()The name of IB methods follow the patternIB <distribution> <lambda><normalization>.- 
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBasefillBasicStats, log2, newStats, scorer
 - 
Methods inherited from class org.apache.lucene.search.similarities.SimilaritycomputeNorm, getDiscountOverlaps
 
- 
 
- 
- 
- 
Field Detail- 
distributionprotected final Distribution distribution The probabilistic distribution used to model term occurrence.
 - 
lambdaprotected final Lambda lambda The lambda (λw) parameter.
 - 
normalizationprotected final Normalization normalization The term frequency normalization.
 
- 
 - 
Constructor Detail- 
IBSimilaritypublic IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization) Creates IBSimilarity from the three components and using default discountOverlaps value.Note that nullvalues are not allowed: if you want no normalization, instead passNormalization.NoNormalization.- Parameters:
- distribution- probabilistic distribution modeling term occurrence
- lambda- distribution's λw parameter
- normalization- term frequency normalization
 
 - 
IBSimilaritypublic IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization, boolean discountOverlaps) Creates IBSimilarity from the three components and with the specified discountOverlaps value.Note that nullvalues are not allowed: if you want no normalization, instead passNormalization.NoNormalization.- Parameters:
- distribution- probabilistic distribution modeling term occurrence
- lambda- distribution's λw parameter
- normalization- term frequency normalization
- discountOverlaps- true if overlap tokens should not impact document length for scoring.
 
 
- 
 - 
Method Detail- 
scoreprotected double score(BasicStats stats, double freq, double docLen) Description copied from class:SimilarityBaseScores the documentdoc.Subclasses must apply their scoring formula in this class. - Specified by:
- scorein class- SimilarityBase
- Parameters:
- stats- the corpus level statistics.
- freq- the term frequency.
- docLen- the document length.
- Returns:
- the score.
 
 - 
explainprotected void explain(List<Explanation> subs, BasicStats stats, double freq, double docLen) Description copied from class:SimilarityBaseSubclasses should implement this method to explain the score.explalready contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing. - Overrides:
- explainin class- SimilarityBase
- Parameters:
- subs- the list of details of the explanation to extend
- stats- the corpus level statistics.
- freq- the term frequency.
- docLen- the document length.
 
 - 
explainprotected Explanation explain(BasicStats stats, Explanation freq, double docLen) Description copied from class:SimilarityBaseExplains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via theSimilarityBase.score(BasicStats, double, double)method) and the explanation for the term frequency. Subclasses content with this format may add additional details inSimilarityBase.explain(List, BasicStats, double, double).- Overrides:
- explainin class- SimilarityBase
- Parameters:
- stats- the corpus level statistics.
- freq- the term frequency and its explanation.
- docLen- the document length.
- Returns:
- the explanation.
 
 - 
toStringpublic String toString() The name of IB methods follow the patternIB <distribution> <lambda><normalization>. The name of the distribution is the same as in the original paper; for the names of lambda parameters, refer to the javadoc of theLambdaclasses.- Specified by:
- toStringin class- SimilarityBase
 
 - 
getDistributionpublic Distribution getDistribution() Returns the distribution
 - 
getLambdapublic Lambda getLambda() Returns the distribution's lambda parameter
 - 
getNormalizationpublic Normalization getNormalization() Returns the term frequency normalization
 
- 
 
-