Class LMSimilarity
java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.SimilarityBase
org.apache.lucene.search.similarities.LMSimilarity
- Direct Known Subclasses:
IndriDirichletSimilarity
,LMDirichletSimilarity
,LMJelinekMercerSimilarity
Abstract superclass for language modeling Similarities. The following inner types are introduced:
LMSimilarity.LMStats
, which defines a new statistic, the probability that the collection language model generates the current term;LMSimilarity.CollectionModel
, which is a strategy interface for object that compute the collection language modelp(w|C)
;LMSimilarity.DefaultCollectionModel
, an implementation of the former, that computes the term probability as the number of occurrences of the term in the collection, divided by the total number of tokens.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic interface
A strategy for computing the collection language model.static class
Modelsp(w|C)
as the number of occurrences of the term in the collection, divided by the total number of tokens+ 1
.static class
Stores the collection distribution of the current term.Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
-
Field Summary
Modifier and TypeFieldDescriptionprotected final LMSimilarity.CollectionModel
The collection model.Fields inherited from class org.apache.lucene.search.similarities.SimilarityBase
discountOverlaps
-
Constructor Summary
ConstructorDescriptionCreates a new instance with the default collection language model.LMSimilarity
(LMSimilarity.CollectionModel collectionModel) Creates a new instance with the specified collection language model. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
explain
(List<Explanation> subExpls, BasicStats stats, double freq, double docLen) Subclasses should implement this method to explain the score.protected void
fillBasicStats
(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats) Computes the collection probability of the current term in addition to the usual statistics.abstract String
getName()
Returns the name of the LM method.protected BasicStats
Factory method to return a custom stats objecttoString()
Returns the name of the LM method.Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
computeNorm, explain, getDiscountOverlaps, log2, score, scorer, setDiscountOverlaps
-
Field Details
-
collectionModel
The collection model.
-
-
Constructor Details
-
LMSimilarity
Creates a new instance with the specified collection language model. -
LMSimilarity
public LMSimilarity()Creates a new instance with the default collection language model.
-
-
Method Details
-
newStats
Description copied from class:SimilarityBase
Factory method to return a custom stats object- Overrides:
newStats
in classSimilarityBase
-
fillBasicStats
protected void fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats) Computes the collection probability of the current term in addition to the usual statistics.- Overrides:
fillBasicStats
in classSimilarityBase
-
explain
Description copied from class:SimilarityBase
Subclasses should implement this method to explain the score.expl
already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing.
- Overrides:
explain
in classSimilarityBase
- Parameters:
subExpls
- the list of details of the explanation to extendstats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.
-
getName
Returns the name of the LM method. The values of the parameters should be included as well.Used in
toString()
. -
toString
Returns the name of the LM method. If a custom collection model strategy is used, its name is included as well.- Specified by:
toString
in classSimilarityBase
- See Also:
-