public abstract class SimilarityBase extends Similarity
Similarity
that provides a simplified API for its
descendants. Subclasses are only required to implement the score(org.apache.lucene.search.similarities.BasicStats, double, double)
and toString()
methods. Implementing
explain(List, BasicStats, double, double)
is optional,
inasmuch as SimilarityBase already provides a basic explanation of the score
and the term frequency. However, implementers of a subclass are encouraged to
include as much detail about the scoring method as possible.
Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.
Similarity.SimScorer
Modifier and Type | Field and Description |
---|---|
protected boolean |
discountOverlaps
True if overlap tokens (tokens with a position of increment of zero) are
discounted from the document's length.
|
Constructor and Description |
---|
SimilarityBase()
Sole constructor.
|
Modifier and Type | Method and Description |
---|---|
long |
computeNorm(FieldInvertState state)
Encodes the document length in the same way as
BM25Similarity . |
protected Explanation |
explain(BasicStats stats,
Explanation freq,
double docLen)
Explains the score.
|
protected void |
explain(List<Explanation> subExpls,
BasicStats stats,
double freq,
double docLen)
Subclasses should implement this method to explain the score.
|
protected void |
fillBasicStats(BasicStats stats,
CollectionStatistics collectionStats,
TermStatistics termStats)
Fills all member fields defined in
BasicStats in stats . |
boolean |
getDiscountOverlaps()
Returns true if overlap tokens are discounted from the document's length.
|
static double |
log2(double x)
Returns the base two logarithm of
x . |
protected BasicStats |
newStats(String field,
double boost)
Factory method to return a custom stats object
|
protected abstract double |
score(BasicStats stats,
double freq,
double docLen)
Scores the document
doc . |
Similarity.SimScorer |
scorer(float boost,
CollectionStatistics collectionStats,
TermStatistics... termStats)
Compute any collection-level weight (e.g.
|
void |
setDiscountOverlaps(boolean v)
Determines whether overlap tokens (Tokens with
0 position increment) are ignored when computing
norm.
|
abstract String |
toString()
Subclasses must override this method to return the name of the Similarity
and preferably the values of parameters (if any) as well.
|
protected boolean discountOverlaps
public SimilarityBase()
public void setDiscountOverlaps(boolean v)
computeNorm(org.apache.lucene.index.FieldInvertState)
public boolean getDiscountOverlaps()
setDiscountOverlaps(boolean)
public final Similarity.SimScorer scorer(float boost, CollectionStatistics collectionStats, TermStatistics... termStats)
Similarity
scorer
in class Similarity
boost
- a multiplicative factor to apply to the produces scorescollectionStats
- collection-level statistics, such as the number of tokens in the collection.termStats
- term-level statistics, such as the document frequency of a term across the collection.protected BasicStats newStats(String field, double boost)
protected void fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats)
BasicStats
in stats
.
Subclasses can override this method to fill additional stats.protected abstract double score(BasicStats stats, double freq, double docLen)
doc
.
Subclasses must apply their scoring formula in this class.
stats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.protected void explain(List<Explanation> subExpls, BasicStats stats, double freq, double docLen)
expl
already contains the score, the name of the class and the doc id, as well
as the term frequency and its explanation; subclasses can add additional
clauses to explain details of their scoring formulae.
The default implementation does nothing.
subExpls
- the list of details of the explanation to extendstats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.protected Explanation explain(BasicStats stats, Explanation freq, double docLen)
score(BasicStats, double, double)
method) and the explanation for the term frequency. Subclasses content with
this format may add additional details in
explain(List, BasicStats, double, double)
.stats
- the corpus level statistics.freq
- the term frequency and its explanation.docLen
- the document length.public abstract String toString()
public final long computeNorm(FieldInvertState state)
BM25Similarity
.computeNorm
in class Similarity
state
- current processing state for this fieldpublic static double log2(double x)
x
.Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.