Class IBSimilarityFactory


  • public class IBSimilarityFactory
    extends SimilarityFactory
    Factory for IBSimilarity

    You must specify the implementations for all three components of the Information-Based model (strings).

    1. distribution: Probabilistic distribution used to model term occurrence
      • LL: Log-logistic
      • SPL: Smoothed power-law
    2. lambda: λw parameter of the probability distribution
      • DF: Nw/N or average number of documents where w occurs
      • TTF: Fw/N or average number of occurrences of w in the collection
    3. normalization: Term frequency normalization
      Any supported DFR normalization listed in DFRSimilarityFactory

    Optional settings:

    • discountOverlaps (bool): Sets SimilarityBase.setDiscountOverlaps(boolean)
    WARNING: This API is experimental and might change in incompatible ways in the next release.