public class DFISimilarity extends SimilarityBase
DFI is both parameter-free and non-parametric:
It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.
For more information see: A nonparametric term weighting method for information retrieval based on measuring the divergence from independence
IndependenceStandardized, 
IndependenceSaturated, 
IndependenceChiSquaredSimilarity.SimScorer, Similarity.SimWeightdiscountOverlaps| Constructor and Description | 
|---|
| DFISimilarity(Independence independenceMeasure)Create DFI with the specified divergence from independence measure | 
| Modifier and Type | Method and Description | 
|---|---|
| Independence | getIndependence()Returns the measure of independence | 
| protected float | score(BasicStats stats,
     float freq,
     float docLen)Scores the document  doc. | 
| String | toString()Subclasses must override this method to return the name of the Similarity
 and preferably the values of parameters (if any) as well. | 
computeNorm, computeWeight, decodeNormValue, encodeNormValue, explain, explain, fillBasicStats, getDiscountOverlaps, log2, newStats, setDiscountOverlaps, simScorercoord, queryNormpublic DFISimilarity(Independence independenceMeasure)
independenceMeasure - measure of divergence from independenceprotected float score(BasicStats stats, float freq, float docLen)
SimilarityBasedoc.
 Subclasses must apply their scoring formula in this class.
score in class SimilarityBasestats - the corpus level statistics.freq - the term frequency.docLen - the document length.public Independence getIndependence()
public String toString()
SimilarityBasetoString in class SimilarityBaseCopyright © 2000-2017 Apache Software Foundation. All Rights Reserved.