Class Intervals
- java.lang.Object
-
- org.apache.lucene.queries.intervals.Intervals
-
public final class Intervals extends Object
Factory functions for creatinginterval sources
.These sources implement minimum-interval algorithms taken from the paper Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics
Note: by default, sources that are sensitive to internal gaps (e.g.
PHRASE
andMAXGAPS
) will rewrite their sub-sources so that disjunctions of different lengths are pulled up to the top of the interval tree. For example,PHRASE(or(PHRASE("a", "b", "c"), "b"), "c")
will automatically rewrite itself toOR(PHRASE("a", "b", "c", "c"), PHRASE("b", "c"))
to ensure that documents containing"b c"
are matched. This can lead to less efficient queries, as more terms need to be loaded (for example, the"c"
iterator above is loaded twice), so if you care more about speed than about accuracy you can use theor(boolean, IntervalsSource...)
factory method to prevent rewriting.
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_MAX_EXPANSIONS
The default number of expansions in:multiterm(CompiledAutomaton, String)
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static IntervalsSource
after(IntervalsSource source, IntervalsSource reference)
Returns intervals from the source that appear after intervals from the referencestatic IntervalsSource
analyzedText(String text, Analyzer analyzer, String field, int maxGaps, boolean ordered)
Returns intervals that correspond to tokens from aTokenStream
returned fortext
by applying the providedAnalyzer
as iftext
was the content of the givenfield
.static IntervalsSource
analyzedText(TokenStream tokenStream, int maxGaps, boolean ordered)
Returns intervals that correspond to tokens from the providedTokenStream
.static IntervalsSource
atLeast(int minShouldMatch, IntervalsSource... sources)
Return intervals that span combinations of intervals fromminShouldMatch
of the sourcesstatic IntervalsSource
before(IntervalsSource source, IntervalsSource reference)
Returns intervals from the source that appear before intervals from the referencestatic IntervalsSource
containedBy(IntervalsSource small, IntervalsSource big)
Create a contained-byIntervalsSource
static IntervalsSource
containing(IntervalsSource big, IntervalsSource small)
Create a containingIntervalsSource
static IntervalsSource
extend(IntervalsSource source, int before, int after)
Create anIntervalsSource
that wraps another source, extending its intervals by a number of positions before and after.static IntervalsSource
fixField(String field, IntervalsSource source)
Create anIntervalsSource
that always returns intervals from a specific fieldstatic IntervalsSource
fuzzyTerm(String term, int maxEdits)
A fuzzy termIntervalsSource
matches the disjunction of intervals of terms that are within the specifiedmaxEdits
from the provided term.static IntervalsSource
fuzzyTerm(String term, int maxEdits, int prefixLength, boolean transpositions, int maxExpansions)
A fuzzy termIntervalsSource
matches the disjunction of intervals of terms that are within the specifiedmaxEdits
from the provided term.static IntervalsSource
maxgaps(int gaps, IntervalsSource subSource)
Create anIntervalsSource
that filters a sub-source by its gapsstatic IntervalsSource
maxwidth(int width, IntervalsSource subSource)
Create anIntervalsSource
that filters a sub-source by the width of its intervalsstatic IntervalsSource
multiterm(CompiledAutomaton ca, int maxExpansions, String pattern)
Expert: Return anIntervalsSource
over the disjunction of all terms that are accepted by the given automatonstatic IntervalsSource
multiterm(CompiledAutomaton ca, String pattern)
Expert: Return anIntervalsSource
over the disjunction of all terms that are accepted by the given automatonstatic IntervalsSource
nonOverlapping(IntervalsSource minuend, IntervalsSource subtrahend)
Create a non-overlapping IntervalsSourcestatic IntervalsSource
notContainedBy(IntervalsSource small, IntervalsSource big)
Create a not-contained-byIntervalsSource
static IntervalsSource
notContaining(IntervalsSource minuend, IntervalsSource subtrahend)
Create a not-containingIntervalsSource
static IntervalsSource
notWithin(IntervalsSource minuend, int positions, IntervalsSource subtrahend)
Create a not-withinIntervalsSource
static IntervalsSource
or(boolean rewrite, List<IntervalsSource> subSources)
Return anIntervalsSource
over the disjunction of a set of sub-sourcesstatic IntervalsSource
or(boolean rewrite, IntervalsSource... subSources)
Return anIntervalsSource
over the disjunction of a set of sub-sourcesstatic IntervalsSource
or(List<IntervalsSource> subSources)
Return anIntervalsSource
over the disjunction of a set of sub-sourcesstatic IntervalsSource
or(IntervalsSource... subSources)
Return anIntervalsSource
over the disjunction of a set of sub-sourcesstatic IntervalsSource
ordered(IntervalsSource... subSources)
Create an orderedIntervalsSource
static IntervalsSource
overlapping(IntervalsSource source, IntervalsSource reference)
Returns intervals from a source that overlap with intervals from another sourcestatic IntervalsSource
phrase(String... terms)
Return anIntervalsSource
exposing intervals for a phrase consisting of a list of termsstatic IntervalsSource
phrase(IntervalsSource... subSources)
Return anIntervalsSource
exposing intervals for a phrase consisting of a list ofinterval sources
static IntervalsSource
prefix(BytesRef prefix)
Return anIntervalsSource
over the disjunction of all terms that begin with a prefixstatic IntervalsSource
prefix(BytesRef prefix, int maxExpansions)
Expert: Return anIntervalsSource
over the disjunction of all terms that begin with a prefixstatic IntervalsSource
term(String term)
Return anIntervalsSource
exposing intervals for a termstatic IntervalsSource
term(String term, Predicate<BytesRef> payloadFilter)
Return anIntervalsSource
exposing intervals for a term, filtered by the value of the term's payload at each positionstatic IntervalsSource
term(BytesRef term)
Return anIntervalsSource
exposing intervals for a termstatic IntervalsSource
term(BytesRef term, Predicate<BytesRef> payloadFilter)
Return anIntervalsSource
exposing intervals for a term, filtered by the value of the term's payload at each positionstatic IntervalsSource
unordered(IntervalsSource... subSources)
Create an unorderedIntervalsSource
.static IntervalsSource
unorderedNoOverlaps(IntervalsSource a, IntervalsSource b)
Create an unorderedIntervalsSource
allowing no overlaps between subsourcesstatic IntervalsSource
wildcard(BytesRef wildcard)
Return anIntervalsSource
over the disjunction of all terms that match a wildcard globstatic IntervalsSource
wildcard(BytesRef wildcard, int maxExpansions)
Expert: Return anIntervalsSource
over the disjunction of all terms that match a wildcard globstatic IntervalsSource
within(IntervalsSource source, int positions, IntervalsSource reference)
Returns intervals of the source that appear within a set number of positions of intervals from the reference
-
-
-
Field Detail
-
DEFAULT_MAX_EXPANSIONS
public static final int DEFAULT_MAX_EXPANSIONS
The default number of expansions in:- See Also:
- Constant Field Values
-
-
Method Detail
-
term
public static IntervalsSource term(BytesRef term)
Return anIntervalsSource
exposing intervals for a term
-
term
public static IntervalsSource term(String term)
Return anIntervalsSource
exposing intervals for a term
-
term
public static IntervalsSource term(String term, Predicate<BytesRef> payloadFilter)
Return anIntervalsSource
exposing intervals for a term, filtered by the value of the term's payload at each position
-
term
public static IntervalsSource term(BytesRef term, Predicate<BytesRef> payloadFilter)
Return anIntervalsSource
exposing intervals for a term, filtered by the value of the term's payload at each position
-
phrase
public static IntervalsSource phrase(String... terms)
Return anIntervalsSource
exposing intervals for a phrase consisting of a list of terms
-
phrase
public static IntervalsSource phrase(IntervalsSource... subSources)
Return anIntervalsSource
exposing intervals for a phrase consisting of a list ofinterval sources
-
or
public static IntervalsSource or(IntervalsSource... subSources)
Return anIntervalsSource
over the disjunction of a set of sub-sourcesAutomatically rewrites if wrapped by an interval source that is sensitive to internal gaps
-
or
public static IntervalsSource or(boolean rewrite, IntervalsSource... subSources)
Return anIntervalsSource
over the disjunction of a set of sub-sources- Parameters:
rewrite
- iffalse
, do not rewrite intervals that are sensitive to internal gaps; this may run more efficiently, but can miss valid hits due to minimizationsubSources
- the sources to combine
-
or
public static IntervalsSource or(List<IntervalsSource> subSources)
Return anIntervalsSource
over the disjunction of a set of sub-sources
-
or
public static IntervalsSource or(boolean rewrite, List<IntervalsSource> subSources)
Return anIntervalsSource
over the disjunction of a set of sub-sources- Parameters:
rewrite
- iffalse
, do not rewrite intervals that are sensitive to internal gaps; this may run more efficiently, but can miss valid hits due to minimizationsubSources
- the sources to combine
-
prefix
public static IntervalsSource prefix(BytesRef prefix)
Return anIntervalsSource
over the disjunction of all terms that begin with a prefix- Throws:
IllegalStateException
- if the prefix expands to more thanDEFAULT_MAX_EXPANSIONS
terms
-
prefix
public static IntervalsSource prefix(BytesRef prefix, int maxExpansions)
Expert: Return anIntervalsSource
over the disjunction of all terms that begin with a prefixWARNING: Setting
maxExpansions
to higher than the default value ofDEFAULT_MAX_EXPANSIONS
can be both slow and memory-intensive- Parameters:
prefix
- the prefix to expandmaxExpansions
- the maximum number of terms to expand to- Throws:
IllegalStateException
- if the prefix expands to more thanmaxExpansions
terms
-
wildcard
public static IntervalsSource wildcard(BytesRef wildcard)
Return anIntervalsSource
over the disjunction of all terms that match a wildcard glob- Throws:
IllegalStateException
- if the wildcard glob expands to more thanDEFAULT_MAX_EXPANSIONS
terms- See Also:
for glob format
-
wildcard
public static IntervalsSource wildcard(BytesRef wildcard, int maxExpansions)
Expert: Return anIntervalsSource
over the disjunction of all terms that match a wildcard globWARNING: Setting
maxExpansions
to higher than the default value ofDEFAULT_MAX_EXPANSIONS
can be both slow and memory-intensive- Parameters:
wildcard
- the glob to expandmaxExpansions
- the maximum number of terms to expand to- Throws:
IllegalStateException
- if the wildcard glob expands to more thanmaxExpansions
terms- See Also:
for glob format
-
fuzzyTerm
public static IntervalsSource fuzzyTerm(String term, int maxEdits)
A fuzzy termIntervalsSource
matches the disjunction of intervals of terms that are within the specifiedmaxEdits
from the provided term.- Parameters:
term
- the term to search formaxEdits
- must be>= 0
and<=
LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE
, useFuzzyQuery.defaultMaxEdits
for the default, if needed.- See Also:
fuzzyTerm(String, int, int, boolean, int)
-
fuzzyTerm
public static IntervalsSource fuzzyTerm(String term, int maxEdits, int prefixLength, boolean transpositions, int maxExpansions)
A fuzzy termIntervalsSource
matches the disjunction of intervals of terms that are within the specifiedmaxEdits
from the provided term.The implementation is delegated to a
multiterm(CompiledAutomaton, int, String)
interval source, with an automaton sourced fromFuzzyQuery
.- Parameters:
term
- the term to search formaxEdits
- must be>= 0
and<=
LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE
, useFuzzyQuery.defaultMaxEdits
for the default, if needed.prefixLength
- length of common (non-fuzzy) prefixmaxExpansions
- the maximum number of terms to match. SettingmaxExpansions
to higher than the default value ofDEFAULT_MAX_EXPANSIONS
can be both slow and memory-intensivetranspositions
- true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.
-
multiterm
public static IntervalsSource multiterm(CompiledAutomaton ca, String pattern)
Expert: Return anIntervalsSource
over the disjunction of all terms that are accepted by the given automaton- Parameters:
ca
- an automaton accepting matching termspattern
- string representation of the given automaton, mostly used in exception messages- Throws:
IllegalStateException
- if the automaton accepts more thanDEFAULT_MAX_EXPANSIONS
terms
-
multiterm
public static IntervalsSource multiterm(CompiledAutomaton ca, int maxExpansions, String pattern)
Expert: Return anIntervalsSource
over the disjunction of all terms that are accepted by the given automatonWARNING: Setting
maxExpansions
to higher than the default value ofDEFAULT_MAX_EXPANSIONS
can be both slow and memory-intensive- Parameters:
ca
- an automaton accepting matching termsmaxExpansions
- the maximum number of terms to expand topattern
- string representation of the given automaton, mostly used in exception messages- Throws:
IllegalStateException
- if the automaton accepts more thanmaxExpansions
terms
-
maxwidth
public static IntervalsSource maxwidth(int width, IntervalsSource subSource)
Create anIntervalsSource
that filters a sub-source by the width of its intervals- Parameters:
width
- the maximum width of intervals in the sub-source to filtersubSource
- the sub-source to filter
-
maxgaps
public static IntervalsSource maxgaps(int gaps, IntervalsSource subSource)
Create anIntervalsSource
that filters a sub-source by its gaps- Parameters:
gaps
- the maximum number of gaps in the sub-source to filtersubSource
- the sub-source to filter
-
extend
public static IntervalsSource extend(IntervalsSource source, int before, int after)
Create anIntervalsSource
that wraps another source, extending its intervals by a number of positions before and after.This can be useful for adding defined gaps in a block query; for example, to find 'a b [2 arbitrary terms] c', you can call:
Intervals.phrase(Intervals.term("a"), Intervals.extend(Intervals.term("b"), 0, 2), Intervals.term("c"));
Note that callingIntervalIterator.gaps()
on iterators returned by this source delegates directly to the wrapped iterator, and does not include the extensions.- Parameters:
source
- the source to extendbefore
- how many positions to extend before the delegated intervalafter
- how many positions to extend after the delegated interval
-
ordered
public static IntervalsSource ordered(IntervalsSource... subSources)
Create an orderedIntervalsSource
Returns intervals in which the subsources all appear in the given order
- Parameters:
subSources
- an ordered set ofIntervalsSource
objects
-
unordered
public static IntervalsSource unordered(IntervalsSource... subSources)
Create an unorderedIntervalsSource
. Note that if there are multiple intervals ends at the same position are eligible, only the narrowest one will be returned. For example if asking forunordered(term("apple"), term("banana"))
on field of "apple wolf apple orange banana", only the "apple orange banana" will be returned.Returns intervals in which all the subsources appear. The subsources may overlap
- Parameters:
subSources
- an unordered set ofIntervalsSource
s
-
unorderedNoOverlaps
public static IntervalsSource unorderedNoOverlaps(IntervalsSource a, IntervalsSource b)
Create an unorderedIntervalsSource
allowing no overlaps between subsourcesReturns intervals in which both the subsources appear and do not overlap.
-
fixField
public static IntervalsSource fixField(String field, IntervalsSource source)
Create anIntervalsSource
that always returns intervals from a specific fieldThis is useful for comparing intervals across multiple fields, for example fields that have been analyzed differently, allowing you to search for stemmed terms near unstemmed terms, etc.
-
nonOverlapping
public static IntervalsSource nonOverlapping(IntervalsSource minuend, IntervalsSource subtrahend)
Create a non-overlapping IntervalsSourceReturns intervals of the minuend that do not overlap with intervals from the subtrahend
- Parameters:
minuend
- theIntervalsSource
to filtersubtrahend
- theIntervalsSource
to filter by
-
overlapping
public static IntervalsSource overlapping(IntervalsSource source, IntervalsSource reference)
Returns intervals from a source that overlap with intervals from another source- Parameters:
source
- the source to filterreference
- the source to filter by
-
notWithin
public static IntervalsSource notWithin(IntervalsSource minuend, int positions, IntervalsSource subtrahend)
Create a not-withinIntervalsSource
Returns intervals of the minuend that do not appear within a set number of positions of intervals from the subtrahend query
- Parameters:
minuend
- theIntervalsSource
to filterpositions
- the minimum distance that intervals from the minuend may occur from intervals of the subtrahendsubtrahend
- theIntervalsSource
to filter by
-
within
public static IntervalsSource within(IntervalsSource source, int positions, IntervalsSource reference)
Returns intervals of the source that appear within a set number of positions of intervals from the reference- Parameters:
source
- theIntervalsSource
to filterpositions
- the maximum distance that intervals of the source may occur from intervals of the referencereference
- theIntervalsSource
to filter by
-
notContaining
public static IntervalsSource notContaining(IntervalsSource minuend, IntervalsSource subtrahend)
Create a not-containingIntervalsSource
Returns intervals from the minuend that do not contain intervals of the subtrahend
- Parameters:
minuend
- theIntervalsSource
to filtersubtrahend
- theIntervalsSource
to filter by
-
containing
public static IntervalsSource containing(IntervalsSource big, IntervalsSource small)
Create a containingIntervalsSource
Returns intervals from the big source that contain one or more intervals from the small source
- Parameters:
big
- theIntervalsSource
to filtersmall
- theIntervalsSource
to filter by
-
notContainedBy
public static IntervalsSource notContainedBy(IntervalsSource small, IntervalsSource big)
Create a not-contained-byIntervalsSource
Returns intervals from the small
IntervalsSource
that do not appear within intervals from the bigIntervalsSource
.- Parameters:
small
- theIntervalsSource
to filterbig
- theIntervalsSource
to filter by
-
containedBy
public static IntervalsSource containedBy(IntervalsSource small, IntervalsSource big)
Create a contained-byIntervalsSource
Returns intervals from the small query that appear within intervals of the big query
- Parameters:
small
- theIntervalsSource
to filterbig
- theIntervalsSource
to filter by
-
atLeast
public static IntervalsSource atLeast(int minShouldMatch, IntervalsSource... sources)
Return intervals that span combinations of intervals fromminShouldMatch
of the sources
-
before
public static IntervalsSource before(IntervalsSource source, IntervalsSource reference)
Returns intervals from the source that appear before intervals from the reference
-
after
public static IntervalsSource after(IntervalsSource source, IntervalsSource reference)
Returns intervals from the source that appear after intervals from the reference
-
analyzedText
public static IntervalsSource analyzedText(String text, Analyzer analyzer, String field, int maxGaps, boolean ordered) throws IOException
Returns intervals that correspond to tokens from aTokenStream
returned fortext
by applying the providedAnalyzer
as iftext
was the content of the givenfield
. The intervals can be ordered or unordered and can have optional gaps inside.- Parameters:
text
- The text to analyze.analyzer
- TheAnalyzer
to use to acquire aTokenStream
which is then converted into intervals.field
- The fieldtext
should be parsed as.maxGaps
- Maximum number of allowed gaps between sub-intervals resulting from tokens.ordered
- Whether sub-intervals should enforce token ordering or not.- Returns:
- Returns an
IntervalsSource
that matches tokens acquired from analysis oftext
. Possibly an empty interval source, nevernull
. - Throws:
IOException
- If an I/O exception occurs.
-
analyzedText
public static IntervalsSource analyzedText(TokenStream tokenStream, int maxGaps, boolean ordered) throws IOException
Returns intervals that correspond to tokens from the providedTokenStream
. This is a low-level counterpart toanalyzedText(String, Analyzer, String, int, boolean)
. The intervals can be ordered or unordered and can have optional gaps inside.- Parameters:
tokenStream
- The token stream to produce intervals for. The token stream may be fully or partially consumed after returning from this method.maxGaps
- Maximum number of allowed gaps between sub-intervals resulting from tokens.ordered
- Whether sub-intervals should enforce token ordering or not.- Returns:
- Returns an
IntervalsSource
that matches tokens acquired from analysis oftext
. Possibly an empty interval source, nevernull
. - Throws:
IOException
- If an I/O exception occurs.
-
-