|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.SimilarityBase
public abstract class SimilarityBase
A subclass of Similarity
that provides a simplified API for its
descendants. Subclasses are only required to implement the score(org.apache.lucene.search.similarities.BasicStats, float, float)
and toString()
methods. Implementing
explain(Explanation, BasicStats, int, float, float)
is optional,
inasmuch as SimilarityBase already provides a basic explanation of the score
and the term frequency. However, implementers of a subclass are encouraged to
include as much detail about the scoring method as possible.
Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity |
---|
Similarity.ExactSimScorer, Similarity.SimWeight, Similarity.SloppySimScorer |
Field Summary | |
---|---|
protected boolean |
discountOverlaps
True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length. |
Constructor Summary | |
---|---|
SimilarityBase()
Sole constructor. |
Method Summary | |
---|---|
long |
computeNorm(FieldInvertState state)
Encodes the document length in the same way as TFIDFSimilarity . |
Similarity.SimWeight |
computeWeight(float queryBoost,
CollectionStatistics collectionStats,
TermStatistics... termStats)
Compute any collection-level weight (e.g. |
protected float |
decodeNormValue(byte norm)
Decodes a normalization factor (document length) stored in an index. |
protected byte |
encodeNormValue(float boost,
float length)
Encodes the length to a byte via SmallFloat. |
Similarity.ExactSimScorer |
exactSimScorer(Similarity.SimWeight stats,
AtomicReaderContext context)
Creates a new Similarity.ExactSimScorer to score matching documents from a segment of the inverted index. |
protected Explanation |
explain(BasicStats stats,
int doc,
Explanation freq,
float docLen)
Explains the score. |
protected void |
explain(Explanation expl,
BasicStats stats,
int doc,
float freq,
float docLen)
Subclasses should implement this method to explain the score. |
protected void |
fillBasicStats(BasicStats stats,
CollectionStatistics collectionStats,
TermStatistics termStats)
Fills all member fields defined in BasicStats in stats . |
boolean |
getDiscountOverlaps()
Returns true if overlap tokens are discounted from the document's length. |
static double |
log2(double x)
Returns the base two logarithm of x . |
protected BasicStats |
newStats(String field,
float queryBoost)
Factory method to return a custom stats object |
protected abstract float |
score(BasicStats stats,
float freq,
float docLen)
Scores the document doc . |
void |
setDiscountOverlaps(boolean v)
Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. |
Similarity.SloppySimScorer |
sloppySimScorer(Similarity.SimWeight stats,
AtomicReaderContext context)
Creates a new Similarity.SloppySimScorer to score matching documents from a segment of the inverted index. |
abstract String |
toString()
Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well. |
Methods inherited from class org.apache.lucene.search.similarities.Similarity |
---|
coord, queryNorm |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected boolean discountOverlaps
Constructor Detail |
---|
public SimilarityBase()
Method Detail |
---|
public void setDiscountOverlaps(boolean v)
computeNorm(org.apache.lucene.index.FieldInvertState)
public boolean getDiscountOverlaps()
setDiscountOverlaps(boolean)
public final Similarity.SimWeight computeWeight(float queryBoost, CollectionStatistics collectionStats, TermStatistics... termStats)
Similarity
computeWeight
in class Similarity
queryBoost
- the query-time boost.collectionStats
- collection-level statistics, such as the number of tokens in the collection.termStats
- term-level statistics, such as the document frequency of a term across the collection.
protected BasicStats newStats(String field, float queryBoost)
protected void fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats)
BasicStats
in stats
.
Subclasses can override this method to fill additional stats.
protected abstract float score(BasicStats stats, float freq, float docLen)
doc
.
Subclasses must apply their scoring formula in this class.
stats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.
protected void explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)
expl
already contains the score, the name of the class and the doc id, as well
as the term frequency and its explanation; subclasses can add additional
clauses to explain details of their scoring formulae.
The default implementation does nothing.
expl
- the explanation to extend with details.stats
- the corpus level statistics.doc
- the document id.freq
- the term frequency.docLen
- the document length.protected Explanation explain(BasicStats stats, int doc, Explanation freq, float docLen)
score(BasicStats, float, float)
method) and the explanation for the term frequency. Subclasses content with
this format may add additional details in
explain(Explanation, BasicStats, int, float, float)
.
stats
- the corpus level statistics.doc
- the document id.freq
- the term frequency and its explanation.docLen
- the document length.
public Similarity.ExactSimScorer exactSimScorer(Similarity.SimWeight stats, AtomicReaderContext context) throws IOException
Similarity
Similarity.ExactSimScorer
to score matching documents from a segment of the inverted index.
exactSimScorer
in class Similarity
stats
- collection information from Similarity.computeWeight(float, CollectionStatistics, TermStatistics...)
context
- segment of the inverted index to be scored.
context
IOException
- if there is a low-level I/O errorpublic Similarity.SloppySimScorer sloppySimScorer(Similarity.SimWeight stats, AtomicReaderContext context) throws IOException
Similarity
Similarity.SloppySimScorer
to score matching documents from a segment of the inverted index.
sloppySimScorer
in class Similarity
stats
- collection information from Similarity.computeWeight(float, CollectionStatistics, TermStatistics...)
context
- segment of the inverted index to be scored.
context
IOException
- if there is a low-level I/O errorpublic abstract String toString()
toString
in class Object
public long computeNorm(FieldInvertState state)
TFIDFSimilarity
.
computeNorm
in class Similarity
state
- current processing state for this field
protected float decodeNormValue(byte norm)
encodeNormValue(float,float)
protected byte encodeNormValue(float boost, float length)
public static double log2(double x)
x
.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |