Class Axiomatic

Direct Known Subclasses:
AxiomaticF1EXP, AxiomaticF1LOG, AxiomaticF2EXP, AxiomaticF2LOG, AxiomaticF3EXP, AxiomaticF3LOG

public abstract class Axiomatic extends SimilarityBase
Axiomatic approaches for IR. From Hui Fang and Chengxiang Zhai 2005. An Exploration of Axiomatic Approaches to Information Retrieval. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '05). ACM, New York, NY, USA, 480-487.

There are a family of models. All of them are based on BM25, Pivoted Document Length Normalization and Language model with Dirichlet prior. Some components (e.g. Term Frequency, Inverted Document Frequency) in the original models are modified so that they follow some axiomatic constraints.

WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity

    Similarity.SimScorer
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected final float
    hyperparam for the primitive weighthing function
    protected final int
    the query length
    protected final float
    hyperparam for the growth function

    Fields inherited from class org.apache.lucene.search.similarities.SimilarityBase

    discountOverlaps
  • Constructor Summary

    Constructors
    Constructor
    Description
    Default constructor
    Axiomatic(float s)
    Constructor setting only s, letting k and queryLen to default
    Axiomatic(float s, int queryLen)
    Constructor setting s and queryLen, letting k to default
    Axiomatic(float s, int queryLen, float k)
    Constructor setting all Axiomatic hyperparameters
  • Method Summary

    Modifier and Type
    Method
    Description
    protected void
    explain(List<Explanation> subs, BasicStats stats, double freq, double docLen)
    Subclasses should implement this method to explain the score.
    protected Explanation
    explain(BasicStats stats, Explanation freq, double docLen)
    Explains the score.
    protected abstract double
    gamma(BasicStats stats, double freq, double docLen)
    compute the gamma component (only for F3EXp and F3LOG)
    protected abstract double
    idf(BasicStats stats, double freq, double docLen)
    compute the inverted document frequency component
    protected abstract Explanation
    idfExplain(BasicStats stats, double freq, double docLen)
    Explain the score of the inverted document frequency component for a single document
    protected abstract double
    ln(BasicStats stats, double freq, double docLen)
    compute the document length component
    protected abstract Explanation
    lnExplain(BasicStats stats, double freq, double docLen)
    Explain the score of the document length component for a single document
    double
    score(BasicStats stats, double freq, double docLen)
    Scores the document doc.
    protected abstract double
    tf(BasicStats stats, double freq, double docLen)
    compute the term frequency component
    protected abstract Explanation
    tfExplain(BasicStats stats, double freq, double docLen)
    Explain the score of the term frequency component for a single document
    protected abstract double
    tfln(BasicStats stats, double freq, double docLen)
    compute the mixed term frequency and document length component
    protected abstract Explanation
    tflnExplain(BasicStats stats, double freq, double docLen)
    Explain the score of the mixed term frequency and document length component for a single document
    abstract String
    Name of the axiomatic method.

    Methods inherited from class org.apache.lucene.search.similarities.SimilarityBase

    computeNorm, fillBasicStats, getDiscountOverlaps, log2, newStats, scorer, setDiscountOverlaps

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

    • s

      protected final float s
      hyperparam for the growth function
    • k

      protected final float k
      hyperparam for the primitive weighthing function
    • queryLen

      protected final int queryLen
      the query length
  • Constructor Details

    • Axiomatic

      public Axiomatic(float s, int queryLen, float k)
      Constructor setting all Axiomatic hyperparameters
      Parameters:
      s - hyperparam for the growth function
      queryLen - the query length
      k - hyperparam for the primitive weighting function
    • Axiomatic

      public Axiomatic(float s)
      Constructor setting only s, letting k and queryLen to default
      Parameters:
      s - hyperparam for the growth function
    • Axiomatic

      public Axiomatic(float s, int queryLen)
      Constructor setting s and queryLen, letting k to default
      Parameters:
      s - hyperparam for the growth function
      queryLen - the query length
    • Axiomatic

      public Axiomatic()
      Default constructor
  • Method Details

    • score

      public double score(BasicStats stats, double freq, double docLen)
      Description copied from class: SimilarityBase
      Scores the document doc.

      Subclasses must apply their scoring formula in this class.

      Specified by:
      score in class SimilarityBase
      Parameters:
      stats - the corpus level statistics.
      freq - the term frequency.
      docLen - the document length.
      Returns:
      the score.
    • explain

      protected Explanation explain(BasicStats stats, Explanation freq, double docLen)
      Description copied from class: SimilarityBase
      Explains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via the SimilarityBase.score(BasicStats, double, double) method) and the explanation for the term frequency. Subclasses content with this format may add additional details in SimilarityBase.explain(List, BasicStats, double, double).
      Overrides:
      explain in class SimilarityBase
      Parameters:
      stats - the corpus level statistics.
      freq - the term frequency and its explanation.
      docLen - the document length.
      Returns:
      the explanation.
    • explain

      protected void explain(List<Explanation> subs, BasicStats stats, double freq, double docLen)
      Description copied from class: SimilarityBase
      Subclasses should implement this method to explain the score. expl already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.

      The default implementation does nothing.

      Overrides:
      explain in class SimilarityBase
      Parameters:
      subs - the list of details of the explanation to extend
      stats - the corpus level statistics.
      freq - the term frequency.
      docLen - the document length.
    • toString

      public abstract String toString()
      Name of the axiomatic method.
      Specified by:
      toString in class SimilarityBase
    • tf

      protected abstract double tf(BasicStats stats, double freq, double docLen)
      compute the term frequency component
    • ln

      protected abstract double ln(BasicStats stats, double freq, double docLen)
      compute the document length component
    • tfln

      protected abstract double tfln(BasicStats stats, double freq, double docLen)
      compute the mixed term frequency and document length component
    • idf

      protected abstract double idf(BasicStats stats, double freq, double docLen)
      compute the inverted document frequency component
    • gamma

      protected abstract double gamma(BasicStats stats, double freq, double docLen)
      compute the gamma component (only for F3EXp and F3LOG)
    • tfExplain

      protected abstract Explanation tfExplain(BasicStats stats, double freq, double docLen)
      Explain the score of the term frequency component for a single document
      Parameters:
      stats - the corpus level statistics
      freq - number of occurrences of term in the document
      docLen - the document length
      Returns:
      Explanation of how the tf component was computed
    • lnExplain

      protected abstract Explanation lnExplain(BasicStats stats, double freq, double docLen)
      Explain the score of the document length component for a single document
      Parameters:
      stats - the corpus level statistics
      freq - number of occurrences of term in the document
      docLen - the document length
      Returns:
      Explanation of how the ln component was computed
    • tflnExplain

      protected abstract Explanation tflnExplain(BasicStats stats, double freq, double docLen)
      Explain the score of the mixed term frequency and document length component for a single document
      Parameters:
      stats - the corpus level statistics
      freq - number of occurrences of term in the document
      docLen - the document length
      Returns:
      Explanation of how the tfln component was computed
    • idfExplain

      protected abstract Explanation idfExplain(BasicStats stats, double freq, double docLen)
      Explain the score of the inverted document frequency component for a single document
      Parameters:
      stats - the corpus level statistics
      freq - number of occurrences of term in the document
      docLen - the document length
      Returns:
      Explanation of how the idf component was computed