org.apache.lucene.search.similarities
Class DFRSimilarity

java.lang.Object
  extended by org.apache.lucene.search.similarities.Similarity
      extended by org.apache.lucene.search.similarities.SimilarityBase
          extended by org.apache.lucene.search.similarities.DFRSimilarity

public class DFRSimilarity
extends SimilarityBase

Implements the divergence from randomness (DFR) framework introduced in Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20, 4 (October 2002), 357-389.

The DFR scoring formula is composed of three separate components: the basic model, the aftereffect and an additional normalization component, represented by the classes BasicModel, AfterEffect and Normalization, respectively. The names of these classes were chosen to match the names of their counterparts in the Terrier IR engine.

To construct a DFRSimilarity, you must specify the implementations for all three components of DFR:

  1. BasicModel: Basic model of information content:
  2. AfterEffect: First normalization of information gain:
  3. Normalization: Second (length) normalization:

Note that qtf, the multiplicity of term-occurrence in the query, is not handled by this implementation.

See Also:
BasicModel, AfterEffect, Normalization
WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.ExactSimScorer, Similarity.SimWeight, Similarity.SloppySimScorer
 
Field Summary
protected  AfterEffect afterEffect
          The first normalization of the information content.
protected  BasicModel basicModel
          The basic model for information content.
protected  Normalization normalization
          The term frequency normalization.
 
Fields inherited from class org.apache.lucene.search.similarities.SimilarityBase
discountOverlaps
 
Constructor Summary
DFRSimilarity(BasicModel basicModel, AfterEffect afterEffect, Normalization normalization)
          Creates DFRSimilarity from the three components.
 
Method Summary
protected  void explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
          Subclasses should implement this method to explain the score.
 AfterEffect getAfterEffect()
          Returns the first normalization
 BasicModel getBasicModel()
          Returns the basic model of information content
 Normalization getNormalization()
          Returns the second normalization
protected  float score(BasicStats stats, float freq, float docLen)
          Scores the document doc.
 String toString()
          Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.
 
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
computeNorm, computeWeight, decodeNormValue, encodeNormValue, exactSimScorer, explain, fillBasicStats, getDiscountOverlaps, log2, newStats, setDiscountOverlaps, sloppySimScorer
 
Methods inherited from class org.apache.lucene.search.similarities.Similarity
coord, queryNorm
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

basicModel

protected final BasicModel basicModel
The basic model for information content.


afterEffect

protected final AfterEffect afterEffect
The first normalization of the information content.


normalization

protected final Normalization normalization
The term frequency normalization.

Constructor Detail

DFRSimilarity

public DFRSimilarity(BasicModel basicModel,
                     AfterEffect afterEffect,
                     Normalization normalization)
Creates DFRSimilarity from the three components.

Note that null values are not allowed: if you want no normalization or after-effect, instead pass Normalization.NoNormalization or AfterEffect.NoAfterEffect respectively.

Parameters:
basicModel - Basic model of information content
afterEffect - First normalization of information gain
normalization - Second (length) normalization
Method Detail

score

protected float score(BasicStats stats,
                      float freq,
                      float docLen)
Description copied from class: SimilarityBase
Scores the document doc.

Subclasses must apply their scoring formula in this class.

Specified by:
score in class SimilarityBase
Parameters:
stats - the corpus level statistics.
freq - the term frequency.
docLen - the document length.
Returns:
the score.

explain

protected void explain(Explanation expl,
                       BasicStats stats,
                       int doc,
                       float freq,
                       float docLen)
Description copied from class: SimilarityBase
Subclasses should implement this method to explain the score. expl already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.

The default implementation does nothing.

Overrides:
explain in class SimilarityBase
Parameters:
expl - the explanation to extend with details.
stats - the corpus level statistics.
doc - the document id.
freq - the term frequency.
docLen - the document length.

toString

public String toString()
Description copied from class: SimilarityBase
Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.

Specified by:
toString in class SimilarityBase

getBasicModel

public BasicModel getBasicModel()
Returns the basic model of information content


getAfterEffect

public AfterEffect getAfterEffect()
Returns the first normalization


getNormalization

public Normalization getNormalization()
Returns the second normalization



Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.