Class LevenshteinAutomata

java.lang.Object
org.apache.lucene.util.automaton.LevenshteinAutomata

public class LevenshteinAutomata extends Object
Class to construct DFAs that match a word within some edit distance.

Implements the algorithm described in: Schulz and Mihov: Fast String Correction with Levenshtein Automata

WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
    Maximum edit distance this class can generate an automaton for.
  • Constructor Summary

    Constructors
    Constructor
    Description
    LevenshteinAutomata(int[] word, int alphaMax, boolean withTranspositions)
    Expert: specify a custom maximum possible symbol (alphaMax); default is Character.MAX_CODE_POINT.
    LevenshteinAutomata(String input, boolean withTranspositions)
    Create a new LevenshteinAutomata for some input String.
  • Method Summary

    Modifier and Type
    Method
    Description
    toAutomaton(int n)
    Compute a DFA that accepts all strings within an edit distance of n.
    toAutomaton(int n, String prefix)
    Compute a DFA that accepts all strings within an edit distance of n, matching the specified exact prefix.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • MAXIMUM_SUPPORTED_DISTANCE

      public static final int MAXIMUM_SUPPORTED_DISTANCE
      Maximum edit distance this class can generate an automaton for.
      See Also:
      NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
  • Constructor Details

    • LevenshteinAutomata

      public LevenshteinAutomata(String input, boolean withTranspositions)
      Create a new LevenshteinAutomata for some input String. Optionally count transpositions as a primitive edit.
    • LevenshteinAutomata

      public LevenshteinAutomata(int[] word, int alphaMax, boolean withTranspositions)
      Expert: specify a custom maximum possible symbol (alphaMax); default is Character.MAX_CODE_POINT.
  • Method Details

    • toAutomaton

      public Automaton toAutomaton(int n)
      Compute a DFA that accepts all strings within an edit distance of n.

      All automata have the following properties:

      • They are deterministic (DFA).
      • There are no transitions to dead states.
      • They are not minimal (some transitions could be combined).
    • toAutomaton

      public Automaton toAutomaton(int n, String prefix)
      Compute a DFA that accepts all strings within an edit distance of n, matching the specified exact prefix.

      All automata have the following properties:

      • They are deterministic (DFA).
      • There are no transitions to dead states.
      • They are not minimal (some transitions could be combined).