Class NGramDistance

  • All Implemented Interfaces:
    StringDistance

    public class NGramDistance
    extends Object
    implements StringDistance
    N-Gram version of edit distance based on paper by Grzegorz Kondrak, "N-gram similarity and distance". Proceedings of the Twelfth International Conference on String Processing and Information Retrieval (SPIRE 2005), pp. 115-126, Buenos Aires, Argentina, November 2005. http://www.cs.ualberta.ca/~kondrak/papers/spire05.pdf

    This implementation uses the position-based optimization to compute partial matches of n-gram sub-strings and adds a null-character prefix of size n-1 so that the first character is contained in the same number of n-grams as a middle character. Null-character prefix matches are discounted so that strings with no matching characters will return a distance of 0.

    • Constructor Detail

      • NGramDistance

        public NGramDistance​(int size)
        Creates an N-Gram distance measure using n-grams of the specified size.
        Parameters:
        size - The size of the n-gram to be used to compute the string distance.
      • NGramDistance

        public NGramDistance()
        Creates an N-Gram distance measure using n-grams of size 2.
    • Method Detail

      • getDistance

        public float getDistance​(String source,
                                 String target)
        Description copied from interface: StringDistance
        Returns a float between 0 and 1 based on how similar the specified strings are to one another. Returning a value of 1 means the specified strings are identical and 0 means the string are maximally different.
        Specified by:
        getDistance in interface StringDistance
        Parameters:
        source - The first string.
        target - The second string.
        Returns:
        a float between 0 and 1 based on how similar the specified strings are to one another.
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object