Class IndriDirichletSimilarity
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.apache.lucene.search.similarities.SimilarityBase
-
- org.apache.lucene.search.similarities.LMSimilarity
-
- org.apache.lucene.search.similarities.IndriDirichletSimilarity
-
public class IndriDirichletSimilarity extends LMSimilarity
Bayesian smoothing using Dirichlet priors as implemented in the Indri Search engine (http://www.lemurproject.org/indri.php). Indri Dirichelet Smoothing!tf_E + mu*P(t|D) P(t|E)= documentLength + documentMu mu*P(t|C) + tf_D where P(t|D)= doclen + mu
A larger value for mu, produces more smoothing. Smoothing is most important for short documents where the probabilities are more granular.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
IndriDirichletSimilarity.IndriCollectionModel
Modelsp(w|C)
as the number of occurrences of the term in the collection, divided by the total number of tokens+ 1
.-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.LMSimilarity
LMSimilarity.CollectionModel, LMSimilarity.DefaultCollectionModel, LMSimilarity.LMStats
-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.search.similarities.LMSimilarity
collectionModel
-
Fields inherited from class org.apache.lucene.search.similarities.SimilarityBase
discountOverlaps
-
-
Constructor Summary
Constructors Constructor Description IndriDirichletSimilarity()
Instantiates the similarity with the default μ value of 2000.IndriDirichletSimilarity(float mu)
Instantiates the similarity with the provided μ parameter.IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel)
Instantiates the similarity with the default μ value of 2000.IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel, float mu)
Instantiates the similarity with the provided μ parameter.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
explain(List<Explanation> subs, BasicStats stats, double freq, double docLen)
Subclasses should implement this method to explain the score.float
getMu()
Returns the μ parameter.String
getName()
Returns the name of the LM method.protected double
score(BasicStats stats, double freq, double docLen)
Scores the documentdoc
.-
Methods inherited from class org.apache.lucene.search.similarities.LMSimilarity
fillBasicStats, newStats, toString
-
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
computeNorm, explain, getDiscountOverlaps, log2, scorer, setDiscountOverlaps
-
-
-
-
Constructor Detail
-
IndriDirichletSimilarity
public IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel, float mu)
Instantiates the similarity with the provided μ parameter.
-
IndriDirichletSimilarity
public IndriDirichletSimilarity(float mu)
Instantiates the similarity with the provided μ parameter.
-
IndriDirichletSimilarity
public IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel)
Instantiates the similarity with the default μ value of 2000.
-
IndriDirichletSimilarity
public IndriDirichletSimilarity()
Instantiates the similarity with the default μ value of 2000.
-
-
Method Detail
-
score
protected double score(BasicStats stats, double freq, double docLen)
Description copied from class:SimilarityBase
Scores the documentdoc
.Subclasses must apply their scoring formula in this class.
- Specified by:
score
in classSimilarityBase
- Parameters:
stats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.- Returns:
- the score.
-
explain
protected void explain(List<Explanation> subs, BasicStats stats, double freq, double docLen)
Description copied from class:SimilarityBase
Subclasses should implement this method to explain the score.expl
already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing.
- Overrides:
explain
in classLMSimilarity
- Parameters:
subs
- the list of details of the explanation to extendstats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.
-
getMu
public float getMu()
Returns the μ parameter.
-
getName
public String getName()
Description copied from class:LMSimilarity
Returns the name of the LM method. The values of the parameters should be included as well.Used in
LMSimilarity.toString()
.- Specified by:
getName
in classLMSimilarity
-
-