public class DFISimilarity extends SimilarityBase
DFI is both parameter-free and non-parametric:
It is highly recommended not to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity.
For more information see: A nonparametric term weighting method for information retrieval based on measuring the divergence from independence
|Constructor and Description|
Create DFI with the specified divergence from independence measure
|Modifier and Type||Method and Description|
Returns the measure of independence
Scores the document
Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.
computeNorm, computeWeight, explain, explain, fillBasicStats, getDiscountOverlaps, log2, newStats, setDiscountOverlaps, simScorer
public DFISimilarity(Independence independenceMeasure)
independenceMeasure- measure of divergence from independence
protected float score(BasicStats stats, float freq, float docLen)
Subclasses must apply their scoring formula in this class.
stats- the corpus level statistics.
freq- the term frequency.
docLen- the document length.
public Independence getIndependence()
public String toString()
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.