public class DFISimilarity extends SimilarityBase
DFI is both parameter-free and non-parametric:
It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.
For more information see: A nonparametric term weighting method for information retrieval based on measuring the divergence from independence
IndependenceStandardized,
IndependenceSaturated,
IndependenceChiSquaredSimilarity.SimScorer, Similarity.SimWeightdiscountOverlaps| Constructor and Description |
|---|
DFISimilarity(Independence independenceMeasure)
Create DFI with the specified divergence from independence measure
|
| Modifier and Type | Method and Description |
|---|---|
Independence |
getIndependence()
Returns the measure of independence
|
protected float |
score(BasicStats stats,
float freq,
float docLen)
Scores the document
doc. |
String |
toString()
Subclasses must override this method to return the name of the Similarity
and preferably the values of parameters (if any) as well.
|
computeNorm, computeWeight, decodeNormValue, encodeNormValue, explain, explain, fillBasicStats, getDiscountOverlaps, log2, newStats, setDiscountOverlaps, simScorercoord, queryNormpublic DFISimilarity(Independence independenceMeasure)
independenceMeasure - measure of divergence from independenceprotected float score(BasicStats stats, float freq, float docLen)
SimilarityBasedoc.
Subclasses must apply their scoring formula in this class.
score in class SimilarityBasestats - the corpus level statistics.freq - the term frequency.docLen - the document length.public Independence getIndependence()
public String toString()
SimilarityBasetoString in class SimilarityBaseCopyright © 2000-2017 Apache Software Foundation. All Rights Reserved.