Package org.apache.lucene.util.automaton
Class Automata
- java.lang.Object
-
- org.apache.lucene.util.automaton.Automata
-
public final class Automata extends Object
Construction of basic automata.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Field Summary
Fields Modifier and Type Field Description static int
MAX_STRING_UNION_TERM_LENGTH
makeStringUnion(Iterable)
limits terms of this max length to ensure the stack doesn't overflow while building, since our algorithm currently relies on recursion.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static int
appendAnyChar(Automaton a, int state)
Accept any single character starting from the specified state, returning the new statestatic int
appendChar(Automaton a, int state, int c)
Appends the specified character to the specified state, returning a new state.static Automaton
makeAnyBinary()
Returns a new (deterministic) automaton that accepts all binary terms.static Automaton
makeAnyChar()
Returns a new (deterministic) automaton that accepts any single codepoint.static Automaton
makeAnyString()
Returns a new (deterministic) automaton that accepts all strings.static Automaton
makeBinary(BytesRef term)
Returns a new (deterministic) automaton that accepts the single given binary term.static Automaton
makeBinaryInterval(BytesRef min, boolean minInclusive, BytesRef max, boolean maxInclusive)
Creates a new deterministic, minimal automaton accepting all binary terms in the specified interval.static Automaton
makeBinaryStringUnion(Iterable<BytesRef> utf8Strings)
Returns a new (deterministic and minimal) automaton that accepts the union of the given collection ofBytesRef
s representing UTF-8 encoded strings.static Automaton
makeBinaryStringUnion(BytesRefIterator utf8Strings)
Returns a new (deterministic and minimal) automaton that accepts the union of the given iterator ofBytesRef
s representing UTF-8 encoded strings.static Automaton
makeChar(int c)
Returns a new (deterministic) automaton that accepts a single codepoint of the given value.static Automaton
makeCharRange(int min, int max)
Returns a new (deterministic) automaton that accepts a single codepoint whose value is in the given interval (including both end points).static Automaton
makeDecimalInterval(int min, int max, int digits)
Returns a new automaton that accepts strings representing decimal (base 10) non-negative integers in the given interval.static Automaton
makeEmpty()
Returns a new (deterministic) automaton with the empty language.static Automaton
makeEmptyString()
Returns a new (deterministic) automaton that accepts only the empty string.static Automaton
makeNonEmptyBinary()
Returns a new (deterministic) automaton that accepts all binary terms except the empty string.static Automaton
makeString(int[] word, int offset, int length)
Returns a new (deterministic) automaton that accepts the single given string from the specified unicode code points.static Automaton
makeString(String s)
Returns a new (deterministic) automaton that accepts the single given string.static Automaton
makeStringUnion(Iterable<BytesRef> utf8Strings)
Returns a new (deterministic and minimal) automaton that accepts the union of the given collection ofBytesRef
s representing UTF-8 encoded strings.static Automaton
makeStringUnion(BytesRefIterator utf8Strings)
Returns a new (deterministic and minimal) automaton that accepts the union of the given iterator ofBytesRef
s representing UTF-8 encoded strings.
-
-
-
Field Detail
-
MAX_STRING_UNION_TERM_LENGTH
public static final int MAX_STRING_UNION_TERM_LENGTH
makeStringUnion(Iterable)
limits terms of this max length to ensure the stack doesn't overflow while building, since our algorithm currently relies on recursion.- See Also:
- Constant Field Values
-
-
Method Detail
-
makeEmpty
public static Automaton makeEmpty()
Returns a new (deterministic) automaton with the empty language.
-
makeEmptyString
public static Automaton makeEmptyString()
Returns a new (deterministic) automaton that accepts only the empty string.
-
makeAnyString
public static Automaton makeAnyString()
Returns a new (deterministic) automaton that accepts all strings.
-
makeAnyBinary
public static Automaton makeAnyBinary()
Returns a new (deterministic) automaton that accepts all binary terms.
-
makeNonEmptyBinary
public static Automaton makeNonEmptyBinary()
Returns a new (deterministic) automaton that accepts all binary terms except the empty string.
-
makeAnyChar
public static Automaton makeAnyChar()
Returns a new (deterministic) automaton that accepts any single codepoint.
-
appendAnyChar
public static int appendAnyChar(Automaton a, int state)
Accept any single character starting from the specified state, returning the new state
-
makeChar
public static Automaton makeChar(int c)
Returns a new (deterministic) automaton that accepts a single codepoint of the given value.
-
appendChar
public static int appendChar(Automaton a, int state, int c)
Appends the specified character to the specified state, returning a new state.
-
makeCharRange
public static Automaton makeCharRange(int min, int max)
Returns a new (deterministic) automaton that accepts a single codepoint whose value is in the given interval (including both end points).
-
makeBinaryInterval
public static Automaton makeBinaryInterval(BytesRef min, boolean minInclusive, BytesRef max, boolean maxInclusive)
Creates a new deterministic, minimal automaton accepting all binary terms in the specified interval. Note that unlikemakeDecimalInterval(int, int, int)
, the returned automaton is infinite, because terms behave like floating point numbers leading with a decimal point. However, in the special case where min == max, and both are inclusive, the automata will be finite and accept exactly one term.
-
makeDecimalInterval
public static Automaton makeDecimalInterval(int min, int max, int digits) throws IllegalArgumentException
Returns a new automaton that accepts strings representing decimal (base 10) non-negative integers in the given interval.- Parameters:
min
- minimal value of intervalmax
- maximal value of interval (both end points are included in the interval)digits
- if > 0, use fixed number of digits (strings must be prefixed by 0's to obtain the right length) - otherwise, the number of digits is not fixed (any number of leading 0s is accepted)- Throws:
IllegalArgumentException
- if min > max or if numbers in the interval cannot be expressed with the given fixed number of digits
-
makeString
public static Automaton makeString(String s)
Returns a new (deterministic) automaton that accepts the single given string.
-
makeBinary
public static Automaton makeBinary(BytesRef term)
Returns a new (deterministic) automaton that accepts the single given binary term.
-
makeString
public static Automaton makeString(int[] word, int offset, int length)
Returns a new (deterministic) automaton that accepts the single given string from the specified unicode code points.
-
makeStringUnion
public static Automaton makeStringUnion(Iterable<BytesRef> utf8Strings)
Returns a new (deterministic and minimal) automaton that accepts the union of the given collection ofBytesRef
s representing UTF-8 encoded strings.- Parameters:
utf8Strings
- The input strings, UTF-8 encoded. The collection must be in sorted order.- Returns:
- An
Automaton
accepting all input strings. The resulting automaton is codepoint based (full unicode codepoints on transitions).
-
makeBinaryStringUnion
public static Automaton makeBinaryStringUnion(Iterable<BytesRef> utf8Strings)
Returns a new (deterministic and minimal) automaton that accepts the union of the given collection ofBytesRef
s representing UTF-8 encoded strings. The resulting automaton will be built in a binary representation.- Parameters:
utf8Strings
- The input strings, UTF-8 encoded. The collection must be in sorted order.- Returns:
- An
Automaton
accepting all input strings. The resulting automaton is binary based (UTF-8 encoded byte transition labels).
-
makeStringUnion
public static Automaton makeStringUnion(BytesRefIterator utf8Strings) throws IOException
Returns a new (deterministic and minimal) automaton that accepts the union of the given iterator ofBytesRef
s representing UTF-8 encoded strings.- Parameters:
utf8Strings
- The input strings, UTF-8 encoded. The iterator must be in sorted order.- Returns:
- An
Automaton
accepting all input strings. The resulting automaton is codepoint based (full unicode codepoints on transitions). - Throws:
IOException
-
makeBinaryStringUnion
public static Automaton makeBinaryStringUnion(BytesRefIterator utf8Strings) throws IOException
Returns a new (deterministic and minimal) automaton that accepts the union of the given iterator ofBytesRef
s representing UTF-8 encoded strings. The resulting automaton will be built in a binary representation.- Parameters:
utf8Strings
- The input strings, UTF-8 encoded. The iterator must be in sorted order.- Returns:
- An
Automaton
accepting all input strings. The resulting automaton is binary based (UTF-8 encoded byte transition labels). - Throws:
IOException
-
-