Class SloppyPhraseMatcher


  • public final class SloppyPhraseMatcher
    extends PhraseMatcher
    Find all slop-valid position-combinations (matches) encountered while traversing/hopping the PhrasePositions.
    The sloppy frequency contribution of a match depends on the distance:
    - highest freq for distance=0 (exact match).
    - freq gets lower as distance gets higher.
    Example: for query "a b"~2, a document "x a b a y" can be matched twice: once for "a b" (distance=0), and once for "b a" (distance=2).
    Possibly not all valid combinations are encountered, because for efficiency we always propagate the least PhrasePosition. This allows to base on PriorityQueue and move forward faster. As result, for example, document "a b c b a" would score differently for queries "a b c"~4 and "c b a"~4, although they really are equivalent. Similarly, for doc "a b c b a f g", query "c b"~2 would get same score as "g f"~2, although "c b"~2 could be matched twice. We may want to fix this in the future (currently not, for performance reasons).
    NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.