Class Operations
- java.lang.Object
-
- org.apache.lucene.util.automaton.Operations
-
public final class Operations extends Object
Automata operations.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_DETERMINIZE_WORK_LIMIT
Default maximum effort thatdeterminize(org.apache.lucene.util.automaton.Automaton, int)
should spend before giving up and throwingTooComplexToDeterminizeException
.static int
MAX_RECURSION_LEVEL
Maximum level of recursion allowed in recursive operations.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static Automaton
complement(Automaton a, int determinizeWorkLimit)
Returns a (deterministic) automaton that accepts the complement of the language of the given automaton.static Automaton
concatenate(List<Automaton> l)
Returns an automaton that accepts the concatenation of the languages of the given automata.static Automaton
concatenate(Automaton a1, Automaton a2)
Returns an automaton that accepts the concatenation of the languages of the given automata.static Automaton
determinize(Automaton a, int workLimit)
Determinizes the given automaton.static String
getCommonPrefix(Automaton a)
Returns the longest string that is a prefix of all accepted strings and visits each state at most once.static BytesRef
getCommonPrefixBytesRef(Automaton a)
Returns the longest BytesRef that is a prefix of all accepted strings and visits each state at most once.static BytesRef
getCommonSuffixBytesRef(Automaton a)
Returns the longest BytesRef that is a suffix of all accepted strings.static IntsRef
getSingleton(Automaton a)
If this automaton accepts a single input, return it.static boolean
hasDeadStates(Automaton a)
Returns true if this automaton has any states that cannot be reached from the initial state or cannot reach an accept state.static boolean
hasDeadStatesFromInitial(Automaton a)
Returns true if there are dead states reachable from an initial state.static boolean
hasDeadStatesToAccept(Automaton a)
Returns true if there are dead states that reach an accept state.static Automaton
intersection(Automaton a1, Automaton a2)
Returns an automaton that accepts the intersection of the languages of the given automata.static boolean
isEmpty(Automaton a)
Returns true if the given automaton accepts no strings.static boolean
isFinite(Automaton a)
Returns true if the language of this automaton is finite.static boolean
isTotal(Automaton a)
Returns true if the given automaton accepts all strings.static boolean
isTotal(Automaton a, int minAlphabet, int maxAlphabet)
Returns true if the given automaton accepts all strings for the specified min/max range of the alphabet.static Automaton
minus(Automaton a1, Automaton a2, int determinizeWorkLimit)
Returns a (deterministic) automaton that accepts the intersection of the language ofa1
and the complement of the language ofa2
.static Automaton
optional(Automaton a)
Returns an automaton that accepts the union of the empty string and the language of the given automaton.static Automaton
removeDeadStates(Automaton a)
Removes transitions to dead states (a state is "dead" if it is not reachable from the initial state or no accept state is reachable from it.)static Automaton
repeat(Automaton a)
Returns an automaton that accepts the Kleene star (zero or more concatenated repetitions) of the language of the given automaton.static Automaton
repeat(Automaton a, int count)
Returns an automaton that acceptsmin
or more concatenated repetitions of the language of the given automaton.static Automaton
repeat(Automaton a, int min, int max)
Returns an automaton that accepts betweenmin
andmax
(including both) concatenated repetitions of the language of the given automaton.static Automaton
reverse(Automaton a)
Returns an automaton accepting the reverse language.static Automaton
reverse(Automaton a, Set<Integer> initialStates)
Reverses the automaton, returning the new initial states.static boolean
run(Automaton a, String s)
Returns true if the given string is accepted by the automaton.static boolean
run(Automaton a, IntsRef s)
Returns true if the given string (expressed as unicode codepoints) is accepted by the automaton.static boolean
sameLanguage(Automaton a1, Automaton a2)
Returns true if these two automata accept exactly the same language.static boolean
subsetOf(Automaton a1, Automaton a2)
Returns true if the language ofa1
is a subset of the language ofa2
.static int[]
topoSortStates(Automaton a)
Returns the topological sort of all states reachable from the initial state.static Automaton
union(Collection<Automaton> l)
Returns an automaton that accepts the union of the languages of the given automata.static Automaton
union(Automaton a1, Automaton a2)
Returns an automaton that accepts the union of the languages of the given automata.
-
-
-
Field Detail
-
DEFAULT_DETERMINIZE_WORK_LIMIT
public static final int DEFAULT_DETERMINIZE_WORK_LIMIT
Default maximum effort thatdeterminize(org.apache.lucene.util.automaton.Automaton, int)
should spend before giving up and throwingTooComplexToDeterminizeException
.- See Also:
- Constant Field Values
-
MAX_RECURSION_LEVEL
public static final int MAX_RECURSION_LEVEL
Maximum level of recursion allowed in recursive operations.- See Also:
- Constant Field Values
-
-
Method Detail
-
concatenate
public static Automaton concatenate(Automaton a1, Automaton a2)
Returns an automaton that accepts the concatenation of the languages of the given automata.Complexity: linear in total number of states.
-
concatenate
public static Automaton concatenate(List<Automaton> l)
Returns an automaton that accepts the concatenation of the languages of the given automata.Complexity: linear in total number of states.
-
optional
public static Automaton optional(Automaton a)
Returns an automaton that accepts the union of the empty string and the language of the given automaton. This may create a dead state.Complexity: linear in number of states.
-
repeat
public static Automaton repeat(Automaton a)
Returns an automaton that accepts the Kleene star (zero or more concatenated repetitions) of the language of the given automaton. Never modifies the input automaton language.Complexity: linear in number of states.
-
repeat
public static Automaton repeat(Automaton a, int count)
Returns an automaton that acceptsmin
or more concatenated repetitions of the language of the given automaton.Complexity: linear in number of states and in
min
.
-
repeat
public static Automaton repeat(Automaton a, int min, int max)
Returns an automaton that accepts betweenmin
andmax
(including both) concatenated repetitions of the language of the given automaton.Complexity: linear in number of states and in
min
andmax
.
-
complement
public static Automaton complement(Automaton a, int determinizeWorkLimit)
Returns a (deterministic) automaton that accepts the complement of the language of the given automaton.Complexity: linear in number of states if already deterministic and exponential otherwise.
- Parameters:
determinizeWorkLimit
- maximum effort to spend determinizing the automaton. Set higher to allow more complex queries and lower to prevent memory exhaustion.DEFAULT_DETERMINIZE_WORK_LIMIT
is a good starting default.
-
minus
public static Automaton minus(Automaton a1, Automaton a2, int determinizeWorkLimit)
Returns a (deterministic) automaton that accepts the intersection of the language ofa1
and the complement of the language ofa2
. As a side-effect, the automata may be determinized, if not already deterministic.Complexity: quadratic in number of states if a2 already deterministic and exponential in number of a2's states otherwise.
- Parameters:
a1
- the initial automatona2
- the automaton to subtractdeterminizeWorkLimit
- maximum effort to spend determinizing the automaton. Set higher to allow more complex queries and lower to prevent memory exhaustion.DEFAULT_DETERMINIZE_WORK_LIMIT
is a good starting default.
-
intersection
public static Automaton intersection(Automaton a1, Automaton a2)
Returns an automaton that accepts the intersection of the languages of the given automata. Never modifies the input automata languages.Complexity: quadratic in number of states.
-
sameLanguage
public static boolean sameLanguage(Automaton a1, Automaton a2)
Returns true if these two automata accept exactly the same language. This is a costly computation! Both automata must be determinized and have no dead states!
-
hasDeadStates
public static boolean hasDeadStates(Automaton a)
Returns true if this automaton has any states that cannot be reached from the initial state or cannot reach an accept state. Cost is O(numTransitions+numStates).
-
hasDeadStatesFromInitial
public static boolean hasDeadStatesFromInitial(Automaton a)
Returns true if there are dead states reachable from an initial state.
-
hasDeadStatesToAccept
public static boolean hasDeadStatesToAccept(Automaton a)
Returns true if there are dead states that reach an accept state.
-
subsetOf
public static boolean subsetOf(Automaton a1, Automaton a2)
Returns true if the language ofa1
is a subset of the language ofa2
. Both automata must be determinized and must have no dead states.Complexity: quadratic in number of states.
-
union
public static Automaton union(Automaton a1, Automaton a2)
Returns an automaton that accepts the union of the languages of the given automata.Complexity: linear in number of states.
-
union
public static Automaton union(Collection<Automaton> l)
Returns an automaton that accepts the union of the languages of the given automata.Complexity: linear in number of states.
-
determinize
public static Automaton determinize(Automaton a, int workLimit)
Determinizes the given automaton.Worst case complexity: exponential in number of states.
- Parameters:
workLimit
- Maximum amount of "work" that the powerset construction will spend before throwingTooComplexToDeterminizeException
. Higher numbers allow this operation to consume more memory and CPU but allow more complex automatons. UseDEFAULT_DETERMINIZE_WORK_LIMIT
as a decent default if you don't otherwise know what to specify.- Throws:
TooComplexToDeterminizeException
- if determinizing requires more thanworkLimit
"effort"
-
isEmpty
public static boolean isEmpty(Automaton a)
Returns true if the given automaton accepts no strings.
-
isTotal
public static boolean isTotal(Automaton a)
Returns true if the given automaton accepts all strings. The automaton must be minimized.
-
isTotal
public static boolean isTotal(Automaton a, int minAlphabet, int maxAlphabet)
Returns true if the given automaton accepts all strings for the specified min/max range of the alphabet. The automaton must be minimized.
-
run
public static boolean run(Automaton a, String s)
Returns true if the given string is accepted by the automaton. The input must be deterministic.Complexity: linear in the length of the string.
Note: for full performance, use the
RunAutomaton
class.
-
run
public static boolean run(Automaton a, IntsRef s)
Returns true if the given string (expressed as unicode codepoints) is accepted by the automaton. The input must be deterministic.Complexity: linear in the length of the string.
Note: for full performance, use the
RunAutomaton
class.
-
removeDeadStates
public static Automaton removeDeadStates(Automaton a)
Removes transitions to dead states (a state is "dead" if it is not reachable from the initial state or no accept state is reachable from it.)
-
isFinite
public static boolean isFinite(Automaton a)
Returns true if the language of this automaton is finite. The automaton must not have any dead states.
-
getCommonPrefix
public static String getCommonPrefix(Automaton a)
Returns the longest string that is a prefix of all accepted strings and visits each state at most once. The automaton must not have dead states. If this automaton has already been converted to UTF-8 (e.g. usingUTF32ToUTF8
) then you should usegetCommonPrefixBytesRef(org.apache.lucene.util.automaton.Automaton)
instead.- Returns:
- common prefix, which can be an empty (length 0) String (never null)
- Throws:
IllegalArgumentException
- if the automaton has dead states reachable from the initial state.
-
getCommonPrefixBytesRef
public static BytesRef getCommonPrefixBytesRef(Automaton a)
Returns the longest BytesRef that is a prefix of all accepted strings and visits each state at most once.- Returns:
- common prefix, which can be an empty (length 0) BytesRef (never null), and might possibly include a UTF-8 fragment of a full Unicode character
-
getSingleton
public static IntsRef getSingleton(Automaton a)
If this automaton accepts a single input, return it. Else, return null. The automaton must be deterministic.
-
getCommonSuffixBytesRef
public static BytesRef getCommonSuffixBytesRef(Automaton a)
Returns the longest BytesRef that is a suffix of all accepted strings. Worst case complexity: quadratic with number of states+transitions.- Returns:
- common suffix, which can be an empty (length 0) BytesRef (never null)
-
reverse
public static Automaton reverse(Automaton a)
Returns an automaton accepting the reverse language.
-
reverse
public static Automaton reverse(Automaton a, Set<Integer> initialStates)
Reverses the automaton, returning the new initial states.
-
topoSortStates
public static int[] topoSortStates(Automaton a)
Returns the topological sort of all states reachable from the initial state. Behavior is undefined if this automaton has cycles. CPU cost is O(numTransitions), and the implementation is recursive so an automaton matching long strings may exhaust the java stack.
-
-