Class LuceneLevenshteinDistance

  • All Implemented Interfaces:
    StringDistance

    public final class LuceneLevenshteinDistance
    extends Object
    implements StringDistance
    Damerau-Levenshtein (optimal string alignment) implemented in a consistent way as Lucene's FuzzyTermsEnum with the transpositions option enabled.

    Notes:

    • This metric treats full unicode codepoints as characters
    • This metric scales raw edit distances into a floating point score based upon the shortest of the two terms
    • Transpositions of two adjacent codepoints are treated as primitive edits.
    • Edits are applied in parallel: for example, "ab" and "bca" have distance 3.
    NOTE: this class is not particularly efficient. It is only intended for merging results from multiple DirectSpellCheckers.
    • Constructor Detail

      • LuceneLevenshteinDistance

        public LuceneLevenshteinDistance()
        Creates a new comparator, mimicing the behavior of Lucene's internal edit distance.
    • Method Detail

      • getDistance

        public float getDistance​(String target,
                                 String other)
        Description copied from interface: StringDistance
        Returns a float between 0 and 1 based on how similar the specified strings are to one another. Returning a value of 1 means the specified strings are identical and 0 means the string are maximally different.
        Specified by:
        getDistance in interface StringDistance
        Parameters:
        target - The first string.
        other - The second string.
        Returns:
        a float between 0 and 1 based on how similar the specified strings are to one another.
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object