Package org.apache.lucene.search.spell
Class NGramDistance
java.lang.Object
org.apache.lucene.search.spell.NGramDistance
- All Implemented Interfaces:
StringDistance
N-Gram version of edit distance based on paper by Grzegorz Kondrak, "N-gram similarity and
distance". Proceedings of the Twelfth International Conference on String Processing and
Information Retrieval (SPIRE 2005), pp. 115-126, Buenos Aires, Argentina, November 2005.
http://www.cs.ualberta.ca/~kondrak/papers/spire05.pdf
This implementation uses the position-based optimization to compute partial matches of n-gram sub-strings and adds a null-character prefix of size n-1 so that the first character is contained in the same number of n-grams as a middle character. Null-character prefix matches are discounted so that strings with no matching characters will return a distance of 0.
-
Constructor Summary
ConstructorDescriptionCreates an N-Gram distance measure using n-grams of size 2.NGramDistance
(int size) Creates an N-Gram distance measure using n-grams of the specified size. -
Method Summary
-
Constructor Details
-
NGramDistance
public NGramDistance(int size) Creates an N-Gram distance measure using n-grams of the specified size.- Parameters:
size
- The size of the n-gram to be used to compute the string distance.
-
NGramDistance
public NGramDistance()Creates an N-Gram distance measure using n-grams of size 2.
-
-
Method Details
-
getDistance
Description copied from interface:StringDistance
Returns a float between 0 and 1 based on how similar the specified strings are to one another. Returning a value of 1 means the specified strings are identical and 0 means the string are maximally different.- Specified by:
getDistance
in interfaceStringDistance
- Parameters:
source
- The first string.target
- The second string.- Returns:
- a float between 0 and 1 based on how similar the specified strings are to one another.
-
hashCode
public int hashCode() -
equals
-
toString
-