org.apache.lucene.search.similarities.Similarity

org.apache.lucene.search.similarities.SimilarityBase

Direct Known Subclasses:: Axiomatic, DFISimilarity, DFRSimilarity, IBSimilarity, LMSimilarity

public abstract class SimilarityBase extends Similarity

A subclass of Similarity that provides a simplified API for its descendants. Subclasses are only required to implement the score(org.apache.lucene.search.similarities.BasicStats, double, double) and toString() methods. Implementing explain(List, BasicStats, double, double) is optional, inasmuch as SimilarityBase already provides a basic explanation of the score and the term frequency. However, implementers of a subclass are encouraged to include as much detail about the scoring method as possible.

Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
Field Summary

Fields

Modifier and Type

Field

Description

protected boolean

discountOverlaps

True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.
Constructor Summary

Constructors

Constructor

Description

SimilarityBase()

Sole constructor.
Method Summary

Modifier and Type

Method

Description

final long

computeNorm(FieldInvertState state)

Encodes the document length in the same way as BM25Similarity.

protected void

explain(List<Explanation> subExpls, BasicStats stats, double freq, double docLen)

Subclasses should implement this method to explain the score.

protected Explanation

explain(BasicStats stats, Explanation freq, double docLen)

Explains the score.

protected void

fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats)

Fills all member fields defined in BasicStats in stats.

boolean

getDiscountOverlaps()

Returns true if overlap tokens are discounted from the document's length.

static double

log2(double x)

Returns the base two logarithm of x.

protected BasicStats

newStats(String field, double boost)

Factory method to return a custom stats object

protected abstract double

score(BasicStats stats, double freq, double docLen)

Scores the document doc.

final Similarity.SimScorer

scorer(float boost, CollectionStatistics collectionStats, TermStatistics... termStats)

Compute any collection-level weight (e.g.

void

setDiscountOverlaps(boolean v)

Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm.

abstract String

toString()

Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- discountOverlaps
  
  protected boolean discountOverlaps
  
  True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.
Constructor Details
- SimilarityBase
  
  public SimilarityBase()
  
  Sole constructor. (For invocation by subclass constructors, typically implicit.)
Method Details
- setDiscountOverlaps
  
  public void setDiscountOverlaps(boolean v)
  
  Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.
  See Also:
  
  computeNorm(org.apache.lucene.index.FieldInvertState)
  
  WARNING: This API is experimental and might change in incompatible ways in the next release.
- getDiscountOverlaps
  
  public boolean getDiscountOverlaps()
  
  Returns true if overlap tokens are discounted from the document's length.
  See Also:
  
  setDiscountOverlaps(boolean)
- scorer
  
  public final Similarity.SimScorer scorer(float boost, CollectionStatistics collectionStats, TermStatistics... termStats)
  
  Description copied from class: Similarity
  
  Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.
  
  Specified by:
  
  scorer in class Similarity
  
  Parameters:
  
  boost - a multiplicative factor to apply to the produces scores
  
  collectionStats - collection-level statistics, such as the number of tokens in the collection.
  
  termStats - term-level statistics, such as the document frequency of a term across the collection.
  
  Returns:
  
  SimWeight object with the information this Similarity needs to score a query.
- newStats
  
  protected BasicStats newStats(String field, double boost)
  
  Factory method to return a custom stats object
- fillBasicStats
  
  protected void fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats)
  
  Fills all member fields defined in BasicStats in stats. Subclasses can override this method to fill additional stats.
- score
  
  protected abstract double score(BasicStats stats, double freq, double docLen)
  
  Scores the document doc.
  Subclasses must apply their scoring formula in this class.
  
  Parameters:
  
  stats - the corpus level statistics.
  
  freq - the term frequency.
  
  docLen - the document length.
  
  Returns:
  
  the score.
- explain
  
  protected void explain(List<Explanation> subExpls, BasicStats stats, double freq, double docLen)
  
  Subclasses should implement this method to explain the score. expl already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.
  The default implementation does nothing.
  
  Parameters:
  
  subExpls - the list of details of the explanation to extend
  
  stats - the corpus level statistics.
  
  freq - the term frequency.
  
  docLen - the document length.
- explain
  
  protected Explanation explain(BasicStats stats, Explanation freq, double docLen)
  
  Explains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via the score(BasicStats, double, double) method) and the explanation for the term frequency. Subclasses content with this format may add additional details in explain(List, BasicStats, double, double).
  
  Parameters:
  
  stats - the corpus level statistics.
  
  freq - the term frequency and its explanation.
  
  docLen - the document length.
  
  Returns:
  
  the explanation.
- toString
  
  public abstract String toString()
  
  Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.
  
  Overrides:
  
  toString in class Object
- computeNorm
  
  public final long computeNorm(FieldInvertState state)
  
  Encodes the document length in the same way as BM25Similarity.
  
  Specified by:
  
  computeNorm in class Similarity
  
  Parameters:
  
  state - current processing state for this field
  
  Returns:
  
  computed norm value
- log2
  
  public static double log2(double x)
  
  Returns the base two logarithm of x.

Class SimilarityBase

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

discountOverlaps

Constructor Details

SimilarityBase

Method Details

setDiscountOverlaps

getDiscountOverlaps

scorer

newStats

fillBasicStats

score

explain

explain

toString

computeNorm

log2