Class LMSimilarity

  • Direct Known Subclasses:
    LMDirichletSimilarity, LMJelinekMercerSimilarity

    public abstract class LMSimilarity
    extends SimilarityBase
    Abstract superclass for language modeling Similarities. The following inner types are introduced:
    • LMSimilarity.LMStats, which defines a new statistic, the probability that the collection language model generates the current term;
    • LMSimilarity.CollectionModel, which is a strategy interface for object that compute the collection language model p(w|C);
    • LMSimilarity.DefaultCollectionModel, an implementation of the former, that computes the term probability as the number of occurrences of the term in the collection, divided by the total number of tokens.
    WARNING: This API is experimental and might change in incompatible ways in the next release.
    • Constructor Detail

      • LMSimilarity

        public LMSimilarity​(LMSimilarity.CollectionModel collectionModel)
        Creates a new instance with the specified collection language model.
      • LMSimilarity

        public LMSimilarity()
        Creates a new instance with the default collection language model.
    • Method Detail

      • explain

        protected void explain​(List<Explanation> subExpls,
                               BasicStats stats,
                               double freq,
                               double docLen)
        Description copied from class: SimilarityBase
        Subclasses should implement this method to explain the score. expl already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.

        The default implementation does nothing.

        explain in class SimilarityBase
        subExpls - the list of details of the explanation to extend
        stats - the corpus level statistics.
        freq - the term frequency.
        docLen - the document length.
      • getName

        public abstract String getName()
        Returns the name of the LM method. The values of the parameters should be included as well.

        Used in toString()