Package org.apache.lucene.queryparser.flexible.standard.nodes.intervalfn
What are interval functions?
Interval functions are a powerful tool to express search needs in terms of one or more contiguous fragments of text and their relationship to one another. Interval functions are implemented by an IntervalQuery but many ready-to-use factory methods are provided in the Intervals class.
When Lucene indexes documents (or rather: document fields) the input text is typically split
into tokens. The details of how this tokenization is performed depends on how the
field's Analyzer
is set up. In the end, each token would
typically have an associated position in the token stream. For example, the following
sentence:
The quick brown fox jumps over the lazy dog
could be transformed into the following token stream (note some token positions are "blank" (grayed out) — these positions reflect stop words that are typically not indexed at all).
The— quick2 brown3 fox4 jumps5 over6 the— lazy7 dog8
Remembering that intervals are contiguous spans between two positions in a document, consider
the following example interval function query: fn:ordered(brown dog)
. This query
selects any span of text between terms brown
and dog
. In our example,
this would correspond to the highlighted fragment below.
The quick brown fox jumps over the lazy dog
This type of interval function can be called an interval selector. The second class of interval functions works by combining or filtering other intervals depending on certain criteria.
The matching interval in the above example can be of any length — if the word
brown
occurs at the beginning of the document and the word dog
at the very
end of the document, the interval would be very long (it would cover the entire document!). Let's
say we want to restrict the matches to only those intervals with at most 3 positions between the
search terms: fn:maxgaps(3 fn:ordered(brown dog))
.
There are five tokens in between search terms (so five "gaps" between the matching interval's positions) and the above query no longer matches our example document at all.
Interval filtering functions allow expressing a variety of conditions other Lucene queries
cannot. For example, consider this interval query that searches for words lazy
or
quick
but only if they are in the neighborhood of one position from any of the words
dog
or fox
:
fn:within(fn:or(lazy quick) 1 fn:or(dog fox))
The result of this query is correctly shown below (only the word lazy
matches the
query, quick
is 2 positions away from fox
).
The quick brown fox jumps over the lazy dog
The remaining part of this document provides more information on the available functions and their expected behavior.
Classification of interval functions
The following groups of interval functions are available in the StandardQueryParser
.
Terms | Alternatives | Length | Context | Ordering | Containment |
---|---|---|---|---|---|
term literalsfn:wildcard |
fn:or fn:atLeast
|
fn:maxgaps fn:maxwidth
|
fn:before fn:after fn:extend fn:within fn:notWithin
|
fn:ordered fn:unordered fn:phrase fn:unorderedNoOverlaps
|
fn:containedBy fn:notContainedBy fn:containing fn:notContaining fn:overlapping fn:nonOverlapping
|
All examples in the description of interval functions (below) assume a document with the following content:
The quick brown fox jumps over the lazy dog
term literals
Quoted or unquoted character sequences are converted into (analyzed) text intervals. While a
single term typically results in a single-term interval, a quoted multi-term phrase will produce
an interval matching the corresponding sequence of tokens. Note this is different from the
fn:phrase
function which takes a sequence of sub-intervals.
- Examples
-
fn:or(quick "fox")
The quick brown fox jumps over the lazy dog
fn:or(\"quick fox\")
(The document would not match — no phrasequick fox
exists.)The quick brown fox jumps over the lazy dog
fn:phrase(quick brown fox)
The quick brown fox jumps over the lazy dog
fn:wildcard
Matches the disjunction of all terms that match a wildcard glob.
Important! The expanded wildcard must not match more than 128 terms. This is an internal limitation that prevents blowing up memory on, for example, prefix expansions that would cover huge numbers of alternatives.
- Arguments
-
fn:wildcard(glob)
glob
- term glob to expand (based on the contents of the index).
- Examples
-
fn:wildcard(jump*)
The quick brown fox jumps over the lazy dog
fn:wildcard(br*n)
The quick brown fox jumps over the lazy dog
fn:or
Matches the disjunction of nested intervals.
- Arguments
-
fn:or(sources...)
sources
- sub-intervals (terms or other functions)
- Examples
-
fn:or(dog fox)
The quick brown fox jumps over the lazy dog
fn:atLeast
Matches documents that contain at least the provided number of source intervals.
- Arguments
-
fn:atLeast(min sources...)
min
- an integer specifying minimum number of sub-interval arguments that must match.
sources
- sub-intervals (terms or other functions)
- Examples
-
fn:atLeast(2 quick fox "furry dog")
The quick brown fox jumps over the lazy dog
fn:atLeast(2 fn:unordered(furry dog) fn:unordered(brown dog) lazy quick)
(This query results in multiple overlapping intervals.)The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
fn:maxgaps
Accepts source
interval if it has at most max
position gaps.
- Arguments
-
fn:maxgaps(gaps source)
gaps
- an integer specifying maximum number of source's position gaps.
source
- source sub-interval.
- Examples
-
fn:maxgaps(0 fn:ordered(fn:or(quick lazy) fn:or(fox dog)))
The quick brown fox jumps over the lazy dog
fn:maxgaps(1 fn:ordered(fn:or(quick lazy) fn:or(fox dog)))
The quick brown fox jumps over the lazy dog
fn:maxwidth
Accepts source
interval if it has at most the given width (position span).
- Arguments
-
fn:maxwidth(max source)
max
- an integer specifying maximum width of source's position span.
source
- source sub-interval.
- Examples
-
fn:maxwidth(2 fn:ordered(fn:or(quick lazy) fn:or(fox dog)))
The quick brown fox jumps over the lazy dog
fn:maxwidth(3 fn:ordered(fn:or(quick lazy) fn:or(fox dog)))
The quick brown fox jumps over the lazy dog
fn:phrase
Matches an ordered, gapless sequence of source intervals.
- Arguments
-
fn:phrase(sources...)
sources
- sub-intervals (terms or other functions)
- Examples
-
fn:phrase(quick brown fox)
The quick brown fox jumps over the lazy dog
fn:phrase(fn:ordered(quick fox) jumps)
The quick brown fox jumps over the lazy dog
fn:ordered
Matches an ordered span containing all source intervals, possibly with gaps in between their respective source interval positions. Source intervals must not overlap.
- Arguments
-
fn:ordered(sources...)
sources
- sub-intervals (terms or other functions)
- Examples
-
fn:ordered(quick jumps dog)
The quick brown fox jumps over the lazy dog
fn:ordered(quick fn:or(fox dog))
(Note only the shorter match out of the two alternatives is included in the result; the algorithm is not required to return or highlight all matching interval alternatives).The quick brown fox jumps over the lazy dog
fn:ordered(quick jumps fn:or(fox dog))
The quick brown fox jumps over the lazy dog
fn:ordered(fn:phrase(brown fox) fn:phrase(fox jumps))
(Sources overlap, no matches.)The quick brown fox jumps over the lazy dog
fn:unordered
Matches an unordered span containing all source intervals, possibly with gaps in between their respective source interval positions. Source intervals may overlap.
- Arguments
-
fn:unordered(sources...)
sources
- sub-intervals (terms or other functions)
- Examples
-
fn:unordered(dog jumps quick)
The quick brown fox jumps over the lazy dog
fn:unordered(fn:or(fox dog) quick)
(Note only the shorter match out of the two alternatives is included in the result; the algorithm is not required to return or highlight all matching interval alternatives).The quick brown fox jumps over the lazy dog
fn:unordered(fn:phrase(brown fox) fn:phrase(fox jumps))
The quick brown fox jumps over the lazy dog
fn:unorderedNoOverlaps
Matches an unordered span containing two source intervals, possibly with gaps in between their respective source interval positions. Source intervals must not overlap.
Note that, unlike fn:unordered
, this function takes a fixed number of arguments
(two).
- Arguments
-
fn:unorderedNoOverlaps(source1 source2)
source1
- sub-interval (term or other function)
source2
- sub-interval (term or other function)
- Examples
-
fn:unorderedNoOverlaps(fn:phrase(fox jumps) brown)
The quick brown fox jumps over the lazy dog
fn:unorderedNoOverlaps(fn:phrase(brown fox) fn:phrase(fox jumps))
(Sources overlap, no matches.)The quick brown fox jumps over the lazy dog
fn:before
Matches intervals from the source that appear before intervals from the reference.
Reference intervals will not be part of the match (this is a filtering function).
- Arguments
-
fn:before(source reference)
source
- source sub-interval (term or other function)
reference
- reference sub-interval (term or other function)
- Examples
-
fn:before(fn:or(brown lazy) fox)
The quick brown fox jumps over the lazy dog
fn:before(fn:or(brown lazy) fn:or(dog fox))
The quick brown fox jumps over the lazy dog
fn:after
Matches intervals from the source that appear after intervals from the reference.
Reference intervals will not be part of the match (this is a filtering function).
- Arguments
-
fn:after(source reference)
source
- source sub-interval (term or other function)
reference
- reference sub-interval (term or other function)
- Examples
-
fn:after(fn:or(brown lazy) fox)
The quick brown fox jumps over the lazy dog
fn:after(fn:or(brown lazy) fn:or(dog fox))
The quick brown fox jumps over the lazy dog
fn:extend
Matches an interval around another source, extending its span by a number of positions before and after.
This is an advanced function that allows extending the left and right "context" of another interval.
- Arguments
-
fn:extend(source before after)
source
- source sub-interval (term or other function)
before
- an integer number of positions to extend to the left of the source
after
- an integer number of positions to extend to the right of the source
- Examples
-
fn:extend(fox 1 2)
The quick brown fox jumps over the lazy dog
fn:extend(fn:or(dog fox) 2 0)
The quick brown fox jumps over the lazy dog
fn:within
Matches intervals of the source that appear within the provided number of positions from the intervals of the reference.
- Arguments
-
fn:within(source positions reference)
source
- source sub-interval (term or other function)
positions
- an integer number of maximum positions between source and reference
reference
- reference sub-interval (term or other function)
- Examples
-
fn:within(fn:or(fox dog) 1 fn:or(quick lazy))
The quick brown fox jumps over the lazy dog
fn:within(fn:or(fox dog) 2 fn:or(quick lazy))
The quick brown fox jumps over the lazy dog
fn:notWithin
Matches intervals of the source that do not appear within the provided number of positions from the intervals of the reference.
- Arguments
-
fn:notWithin(source positions reference)
source
- source sub-interval (term or other function)
positions
- an integer number of maximum positions between source and reference
reference
- reference sub-interval (term or other function)
- Examples
-
fn:notWithin(fn:or(fox dog) 1 fn:or(quick lazy))
The quick brown fox jumps over the lazy dog
fn:containedBy
Matches intervals of the source that are contained by intervals of the reference.
- Arguments
-
fn:containedBy(source reference)
source
- source sub-interval (term or other function)
reference
- reference sub-interval (term or other function)
- Examples
-
fn:containedBy(fn:or(fox dog) fn:ordered(quick lazy))
The quick brown fox jumps over the lazy dog
fn:containedBy(fn:or(fox dog) fn:extend(lazy 3 3))
The quick brown fox jumps over the lazy dog
fn:notContainedBy
Matches intervals of the source that are not contained by intervals of the reference.
- Arguments
-
fn:notContainedBy(source reference)
source
- source sub-interval (term or other function)
reference
- reference sub-interval (term or other function)
- Examples
-
fn:notContainedBy(fn:or(fox dog) fn:ordered(quick lazy))
The quick brown fox jumps over the lazy dog
fn:notContainedBy(fn:or(fox dog) fn:extend(lazy 3 3))
The quick brown fox jumps over the lazy dog
fn:containing
Matches intervals of the source that contain at least one intervals of the reference.
- Arguments
-
fn:containing(source reference)
source
- source sub-interval (term or other function)
reference
- reference sub-interval (term or other function)
- Examples
-
fn:containing(fn:extend(fn:or(lazy brown) 1 1) fn:or(fox dog))
The quick brown fox jumps over the lazy dog
fn:containing(fn:atLeast(2 quick fox dog) jumps)
The quick brown fox jumps over the lazy dog
fn:notContaining
Matches intervals of the source that do not contain any intervals of the reference.
- Arguments
-
fn:notContaining(source reference)
source
- source sub-interval (term or other function)
reference
- reference sub-interval (term or other function)
- Examples
-
fn:notContaining(fn:extend(fn:or(fox dog) 1 0) fn:or(brown yellow))
The quick brown fox jumps over the lazy dog
fn:notContaining(fn:ordered(fn:or(the The) fn:or(fox dog)) brown)
The quick brown fox jumps over the lazy dog
fn:overlapping
Matches intervals of the source that overlap with at least one interval of the reference.
- Arguments
-
fn:overlapping(source reference)
source
- source sub-interval (term or other function)
reference
- reference sub-interval (term or other function)
- Examples
-
fn:overlapping(fn:phrase(brown fox) fn:phrase(fox jumps))
The quick brown fox jumps over the lazy dog
fn:overlapping(fn:or(fox dog) fn:extend(lazy 2 2))
The quick brown fox jumps over the lazy dog
fn:nonOverlapping
Matches intervals of the source that do not overlap with any intervals of the reference.
- Arguments
-
fn:nonOverlapping(source reference)
source
- source sub-interval (term or other function)
reference
- reference sub-interval (term or other function)
- Examples
-
fn:nonOverlapping(fn:phrase(brown fox) fn:phrase(lazy dog))
The quick brown fox jumps over the lazy dog
fn:nonOverlapping(fn:or(fox dog) fn:extend(lazy 2 2))
The quick brown fox jumps over the lazy dog