public class DFISimilarity extends SimilarityBase
DFI is both parameter-free and non-parametric:
It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.
For more information see: A nonparametric term weighting method for information retrieval based on measuring the divergence from independence
IndependenceStandardized, 
IndependenceSaturated, 
IndependenceChiSquaredSimilarity.SimScorerdiscountOverlaps| Constructor and Description | 
|---|
| DFISimilarity(Independence independenceMeasure)Create DFI with the specified divergence from independence measure | 
| Modifier and Type | Method and Description | 
|---|---|
| protected Explanation | explain(BasicStats stats,
       Explanation freq,
       double docLen)Explains the score. | 
| Independence | getIndependence()Returns the measure of independence | 
| protected double | score(BasicStats stats,
     double freq,
     double docLen)Scores the document  doc. | 
| String | toString()Subclasses must override this method to return the name of the Similarity
 and preferably the values of parameters (if any) as well. | 
computeNorm, explain, fillBasicStats, getDiscountOverlaps, log2, newStats, scorer, setDiscountOverlapspublic DFISimilarity(Independence independenceMeasure)
independenceMeasure - measure of divergence from independenceprotected double score(BasicStats stats, double freq, double docLen)
SimilarityBasedoc.
 Subclasses must apply their scoring formula in this class.
score in class SimilarityBasestats - the corpus level statistics.freq - the term frequency.docLen - the document length.public Independence getIndependence()
protected Explanation explain(BasicStats stats, Explanation freq, double docLen)
SimilarityBaseSimilarityBase.score(BasicStats, double, double)
 method) and the explanation for the term frequency. Subclasses content with
 this format may add additional details in
 SimilarityBase.explain(List, BasicStats, double, double).explain in class SimilarityBasestats - the corpus level statistics.freq - the term frequency and its explanation.docLen - the document length.public String toString()
SimilarityBasetoString in class SimilarityBaseCopyright © 2000-2019 Apache Software Foundation. All Rights Reserved.