Class DFISimilarity
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.apache.lucene.search.similarities.SimilarityBase
-
- org.apache.lucene.search.similarities.DFISimilarity
-
public class DFISimilarity extends SimilarityBase
Implements the Divergence from Independence (DFI) model based on Chi-square statistics (i.e., standardized Chi-squared distance from independence in term frequency tf).DFI is both parameter-free and non-parametric:
- parameter-free: it does not require any parameter tuning or training.
- non-parametric: it does not make any assumptions about word frequency distributions on document collections.
It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.
For more information see: A nonparametric term weighting method for information retrieval based on measuring the divergence from independence
- See Also:
IndependenceStandardized
,IndependenceSaturated
,IndependenceChiSquared
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer, Similarity.SimWeight
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.search.similarities.SimilarityBase
discountOverlaps
-
-
Constructor Summary
Constructors Constructor Description DFISimilarity(Independence independenceMeasure)
Create DFI with the specified divergence from independence measure
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Independence
getIndependence()
Returns the measure of independenceprotected float
score(BasicStats stats, float freq, float docLen)
Scores the documentdoc
.String
toString()
Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.-
Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase
computeNorm, computeWeight, explain, explain, fillBasicStats, getDiscountOverlaps, log2, newStats, setDiscountOverlaps, simScorer
-
-
-
-
Constructor Detail
-
DFISimilarity
public DFISimilarity(Independence independenceMeasure)
Create DFI with the specified divergence from independence measure- Parameters:
independenceMeasure
- measure of divergence from independence
-
-
Method Detail
-
score
protected float score(BasicStats stats, float freq, float docLen)
Description copied from class:SimilarityBase
Scores the documentdoc
.Subclasses must apply their scoring formula in this class.
- Specified by:
score
in classSimilarityBase
- Parameters:
stats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.- Returns:
- the score.
-
getIndependence
public Independence getIndependence()
Returns the measure of independence
-
toString
public String toString()
Description copied from class:SimilarityBase
Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.- Specified by:
toString
in classSimilarityBase
-
-