org.apache.lucene.search.similarities
Class IBSimilarity

java.lang.Object
  extended by org.apache.lucene.search.similarities.Similarity
      extended by org.apache.lucene.search.similarities.SimilarityBase
          extended by org.apache.lucene.search.similarities.IBSimilarity

public class IBSimilarity
extends SimilarityBase

Provides a framework for the family of information-based models, as described in Stéphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10). ACM, New York, NY, USA, 234-241.

The retrieval function is of the form RSV(q, d) = ∑ -xqw log Prob(Xw ≥ tdw | λw), where

The framework described in the paper has many similarities to the DFR framework (see DFRSimilarity). It is possible that the two Similarities will be merged at one point.

To construct an IBSimilarity, you must specify the implementations for all three components of the Information-Based model.

  1. Distribution: Probabilistic distribution used to model term occurrence
  2. Lambda: λw parameter of the probability distribution
  3. Normalization: Term frequency normalization
    Any supported DFR normalization (listed in DFRSimilarity)

See Also:
DFRSimilarity
WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.ExactSimScorer, Similarity.SimWeight, Similarity.SloppySimScorer
 
Field Summary
protected  Distribution distribution
          The probabilistic distribution used to model term occurrence.
protected  Lambda lambda
          The lambda (λw) parameter.
protected  Normalization normalization
          The term frequency normalization.
 
Fields inherited from class org.apache.lucene.search.similarities.SimilarityBase
discountOverlaps
 
Constructor Summary
IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization)
          Creates IBSimilarity from the three components.
 
Method Summary
protected  void explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
          Subclasses should implement this method to explain the score.
 Distribution getDistribution()
          Returns the distribution
 Lambda getLambda()
          Returns the distribution's lambda parameter
 Normalization getNormalization()
          Returns the term frequency normalization
protected  float score(BasicStats stats, float freq, float docLen)
          Scores the document doc.
 String toString()
          The name of IB methods follow the pattern IB <distribution> <lambda><normalization>.
 
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
computeNorm, computeWeight, decodeNormValue, encodeNormValue, exactSimScorer, explain, fillBasicStats, getDiscountOverlaps, log2, newStats, setDiscountOverlaps, sloppySimScorer
 
Methods inherited from class org.apache.lucene.search.similarities.Similarity
coord, queryNorm
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

distribution

protected final Distribution distribution
The probabilistic distribution used to model term occurrence.


lambda

protected final Lambda lambda
The lambda (λw) parameter.


normalization

protected final Normalization normalization
The term frequency normalization.

Constructor Detail

IBSimilarity

public IBSimilarity(Distribution distribution,
                    Lambda lambda,
                    Normalization normalization)
Creates IBSimilarity from the three components.

Note that null values are not allowed: if you want no normalization, instead pass Normalization.NoNormalization.

Parameters:
distribution - probabilistic distribution modeling term occurrence
lambda - distribution's λw parameter
normalization - term frequency normalization
Method Detail

score

protected float score(BasicStats stats,
                      float freq,
                      float docLen)
Description copied from class: SimilarityBase
Scores the document doc.

Subclasses must apply their scoring formula in this class.

Specified by:
score in class SimilarityBase
Parameters:
stats - the corpus level statistics.
freq - the term frequency.
docLen - the document length.
Returns:
the score.

explain

protected void explain(Explanation expl,
                       BasicStats stats,
                       int doc,
                       float freq,
                       float docLen)
Description copied from class: SimilarityBase
Subclasses should implement this method to explain the score. expl already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.

The default implementation does nothing.

Overrides:
explain in class SimilarityBase
Parameters:
expl - the explanation to extend with details.
stats - the corpus level statistics.
doc - the document id.
freq - the term frequency.
docLen - the document length.

toString

public String toString()
The name of IB methods follow the pattern IB <distribution> <lambda><normalization>. The name of the distribution is the same as in the original paper; for the names of lambda parameters, refer to the javadoc of the Lambda classes.

Specified by:
toString in class SimilarityBase

getDistribution

public Distribution getDistribution()
Returns the distribution


getLambda

public Lambda getLambda()
Returns the distribution's lambda parameter


getNormalization

public Normalization getNormalization()
Returns the term frequency normalization



Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.