Class IBSimilarity
java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.SimilarityBase
org.apache.lucene.search.similarities.IBSimilarity
Provides a framework for the family of information-based models, as described in Stéphane
Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceeding of the
33rd international ACM SIGIR conference on Research and development in information retrieval
(SIGIR '10). ACM, New York, NY, USA, 234-241.
The retrieval function is of the form RSV(q, d) = ∑ -xqw log Prob(Xw ≥ tdw | λw), where
- xqw is the query boost;
- Xw is a random variable that counts the occurrences of word w;
- tdw is the normalized term frequency;
- λw is a parameter.
The framework described in the paper has many similarities to the DFR framework (see DFRSimilarity
). It is possible that the two Similarities will be merged at one point.
To construct an IBSimilarity, you must specify the implementations for all three components of the Information-Based model.
Distribution
: Probabilistic distribution used to model term occurrenceDistributionLL
: Log-logisticDistributionLL
: Smoothed power-law
Lambda
: λw parameter of the probability distributionNormalization
: Term frequency normalizationAny supported DFR normalization (listed in
DFRSimilarity
)
- See Also:
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
-
Field Summary
Modifier and TypeFieldDescriptionprotected final Distribution
The probabilistic distribution used to model term occurrence.protected final Lambda
The lambda (λw) parameter.protected final Normalization
The term frequency normalization.Fields inherited from class org.apache.lucene.search.similarities.SimilarityBase
discountOverlaps
-
Constructor Summary
ConstructorDescriptionIBSimilarity
(Distribution distribution, Lambda lambda, Normalization normalization) Creates IBSimilarity from the three components. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
explain
(List<Explanation> subs, BasicStats stats, double freq, double docLen) Subclasses should implement this method to explain the score.protected Explanation
explain
(BasicStats stats, Explanation freq, double docLen) Explains the score.Returns the distributionReturns the distribution's lambda parameterReturns the term frequency normalizationprotected double
score
(BasicStats stats, double freq, double docLen) Scores the documentdoc
.toString()
The name of IB methods follow the patternIB <distribution> <lambda><normalization>
.Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
computeNorm, fillBasicStats, getDiscountOverlaps, log2, newStats, scorer, setDiscountOverlaps
-
Field Details
-
distribution
The probabilistic distribution used to model term occurrence. -
lambda
The lambda (λw) parameter. -
normalization
The term frequency normalization.
-
-
Constructor Details
-
IBSimilarity
Creates IBSimilarity from the three components.Note that
null
values are not allowed: if you want no normalization, instead passNormalization.NoNormalization
.- Parameters:
distribution
- probabilistic distribution modeling term occurrencelambda
- distribution's λw parameternormalization
- term frequency normalization
-
-
Method Details
-
score
Description copied from class:SimilarityBase
Scores the documentdoc
.Subclasses must apply their scoring formula in this class.
- Specified by:
score
in classSimilarityBase
- Parameters:
stats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.- Returns:
- the score.
-
explain
Description copied from class:SimilarityBase
Subclasses should implement this method to explain the score.expl
already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing.
- Overrides:
explain
in classSimilarityBase
- Parameters:
subs
- the list of details of the explanation to extendstats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.
-
explain
Description copied from class:SimilarityBase
Explains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via theSimilarityBase.score(BasicStats, double, double)
method) and the explanation for the term frequency. Subclasses content with this format may add additional details inSimilarityBase.explain(List, BasicStats, double, double)
.- Overrides:
explain
in classSimilarityBase
- Parameters:
stats
- the corpus level statistics.freq
- the term frequency and its explanation.docLen
- the document length.- Returns:
- the explanation.
-
toString
The name of IB methods follow the patternIB <distribution> <lambda><normalization>
. The name of the distribution is the same as in the original paper; for the names of lambda parameters, refer to the javadoc of theLambda
classes.- Specified by:
toString
in classSimilarityBase
-
getDistribution
Returns the distribution -
getLambda
Returns the distribution's lambda parameter -
getNormalization
Returns the term frequency normalization
-