Package org.apache.lucene.queryparser.flexible.standard.nodes.intervalfn


package org.apache.lucene.queryparser.flexible.standard.nodes.intervalfn
This package contains classes that implement interval function support for the standard syntax parser.

What are interval functions?

Interval functions are a powerful tool to express search needs in terms of one or more contiguous fragments of text and their relationship to one another. Interval functions are implemented by an IntervalQuery but many ready-to-use factory methods are provided in the Intervals class.

When Lucene indexes documents (or rather: document fields) the input text is typically split into tokens. The details of how this tokenization is performed depends on how the field's Analyzer is set up. In the end, each token would typically have an associated position in the token stream. For example, the following sentence:

The quick brown fox jumps over the lazy dog

could be transformed into the following token stream (note some token positions are "blank" (grayed out) — these positions reflect stop words that are typically not indexed at all).

The quick2 brown3 fox4 jumps5 over6 the lazy7 dog8

Remembering that intervals are contiguous spans between two positions in a document, consider the following example interval function query: fn:ordered(brown dog). This query selects any span of text between terms brown and dog. In our example, this would correspond to the highlighted fragment below.

The quick brown fox jumps over the lazy dog

This type of interval function can be called an interval selector. The second class of interval functions works by combining or filtering other intervals depending on certain criteria.

The matching interval in the above example can be of any length — if the word brown occurs at the beginning of the document and the word dog at the very end of the document, the interval would be very long (it would cover the entire document!). Let's say we want to restrict the matches to only those intervals with at most 3 positions between the search terms: fn:maxgaps(3 fn:ordered(brown dog)).

There are five tokens in between search terms (so five "gaps" between the matching interval's positions) and the above query no longer matches our example document at all.

Interval filtering functions allow expressing a variety of conditions other Lucene queries cannot. For example, consider this interval query that searches for words lazy or quick but only if they are in the neighborhood of one position from any of the words dog or fox:

fn:within(fn:or(lazy quick) 1 fn:or(dog fox))

The result of this query is correctly shown below (only the word lazy matches the query, quick is 2 positions away from fox).

The quick brown fox jumps over the lazy dog

The remaining part of this document provides more information on the available functions and their expected behavior.

Classification of interval functions

The following groups of interval functions are available in the StandardQueryParser.

Interval functions grouped by similar functionality.
Terms Alternatives Length Context Ordering Containment
term literals
fn:wildcard
fn:or
fn:atLeast
fn:maxgaps
fn:maxwidth
fn:before
fn:after
fn:extend
fn:within
fn:notWithin
fn:ordered
fn:unordered
fn:phrase
fn:unorderedNoOverlaps
fn:containedBy
fn:notContainedBy
fn:containing
fn:notContaining
fn:overlapping
fn:nonOverlapping

All examples in the description of interval functions (below) assume a document with the following content:

The quick brown fox jumps over the lazy dog

term literals

Quoted or unquoted character sequences are converted into (analyzed) text intervals. While a single term typically results in a single-term interval, a quoted multi-term phrase will produce an interval matching the corresponding sequence of tokens. Note this is different from the fn:phrase function which takes a sequence of sub-intervals.

Examples
  • fn:or(quick "fox")

    The quick brown fox jumps over the lazy dog

  • fn:or(\"quick fox\") (The document would not match — no phrase quick fox exists.)

    The quick brown fox jumps over the lazy dog

  • fn:phrase(quick brown fox)

    The quick brown fox jumps over the lazy dog

fn:wildcard

Matches the disjunction of all terms that match a wildcard glob.

Important! The expanded wildcard must not match more than 128 terms. This is an internal limitation that prevents blowing up memory on, for example, prefix expansions that would cover huge numbers of alternatives.

Arguments

fn:wildcard(glob)

glob
term glob to expand (based on the contents of the index).
Examples
  • fn:wildcard(jump*)

    The quick brown fox jumps over the lazy dog

  • fn:wildcard(br*n)

    The quick brown fox jumps over the lazy dog

fn:or

Matches the disjunction of nested intervals.

Arguments

fn:or(sources...)

sources
sub-intervals (terms or other functions)
Examples
  • fn:or(dog fox)

    The quick brown fox jumps over the lazy dog

fn:atLeast

Matches documents that contain at least the provided number of source intervals.

Arguments

fn:atLeast(min sources...)

min
an integer specifying minimum number of sub-interval arguments that must match.
sources
sub-intervals (terms or other functions)
Examples
  • fn:atLeast(2 quick fox "furry dog")

    The quick brown fox jumps over the lazy dog

  • fn:atLeast(2 fn:unordered(furry dog) fn:unordered(brown dog) lazy quick) (This query results in multiple overlapping intervals.)

    The quick brown fox jumps over the lazy dog
    The quick brown fox jumps over the lazy dog
    The quick brown fox jumps over the lazy dog

fn:maxgaps

Accepts source interval if it has at most max position gaps.

Arguments

fn:maxgaps(gaps source)

gaps
an integer specifying maximum number of source's position gaps.
source
source sub-interval.
Examples
  • fn:maxgaps(0 fn:ordered(fn:or(quick lazy) fn:or(fox dog)))

    The quick brown fox jumps over the lazy dog

  • fn:maxgaps(1 fn:ordered(fn:or(quick lazy) fn:or(fox dog)))

    The quick brown fox jumps over the lazy dog

fn:maxwidth

Accepts source interval if it has at most the given width (position span).

Arguments

fn:maxwidth(max source)

max
an integer specifying maximum width of source's position span.
source
source sub-interval.
Examples
  • fn:maxwidth(2 fn:ordered(fn:or(quick lazy) fn:or(fox dog)))

    The quick brown fox jumps over the lazy dog

  • fn:maxwidth(3 fn:ordered(fn:or(quick lazy) fn:or(fox dog)))

    The quick brown fox jumps over the lazy dog

fn:phrase

Matches an ordered, gapless sequence of source intervals.

Arguments

fn:phrase(sources...)

sources
sub-intervals (terms or other functions)
Examples
  • fn:phrase(quick brown fox)

    The quick brown fox jumps over the lazy dog

  • fn:phrase(fn:ordered(quick fox) jumps)

    The quick brown fox jumps over the lazy dog

fn:ordered

Matches an ordered span containing all source intervals, possibly with gaps in between their respective source interval positions. Source intervals must not overlap.

Arguments

fn:ordered(sources...)

sources
sub-intervals (terms or other functions)
Examples
  • fn:ordered(quick jumps dog)

    The quick brown fox jumps over the lazy dog

  • fn:ordered(quick fn:or(fox dog)) (Note only the shorter match out of the two alternatives is included in the result; the algorithm is not required to return or highlight all matching interval alternatives).

    The quick brown fox jumps over the lazy dog

  • fn:ordered(quick jumps fn:or(fox dog))

    The quick brown fox jumps over the lazy dog

  • fn:ordered(fn:phrase(brown fox) fn:phrase(fox jumps)) (Sources overlap, no matches.)

    The quick brown fox jumps over the lazy dog

fn:unordered

Matches an unordered span containing all source intervals, possibly with gaps in between their respective source interval positions. Source intervals may overlap.

Arguments

fn:unordered(sources...)

sources
sub-intervals (terms or other functions)
Examples
  • fn:unordered(dog jumps quick)

    The quick brown fox jumps over the lazy dog

  • fn:unordered(fn:or(fox dog) quick) (Note only the shorter match out of the two alternatives is included in the result; the algorithm is not required to return or highlight all matching interval alternatives).

    The quick brown fox jumps over the lazy dog

  • fn:unordered(fn:phrase(brown fox) fn:phrase(fox jumps))

    The quick brown fox jumps over the lazy dog

fn:unorderedNoOverlaps

Matches an unordered span containing two source intervals, possibly with gaps in between their respective source interval positions. Source intervals must not overlap.

Note that, unlike fn:unordered, this function takes a fixed number of arguments (two).

Arguments

fn:unorderedNoOverlaps(source1 source2)

source1
sub-interval (term or other function)
source2
sub-interval (term or other function)
Examples
  • fn:unorderedNoOverlaps(fn:phrase(fox jumps) brown)

    The quick brown fox jumps over the lazy dog

  • fn:unorderedNoOverlaps(fn:phrase(brown fox) fn:phrase(fox jumps)) (Sources overlap, no matches.)

    The quick brown fox jumps over the lazy dog

fn:before

Matches intervals from the source that appear before intervals from the reference.

Reference intervals will not be part of the match (this is a filtering function).

Arguments

fn:before(source reference)

source
source sub-interval (term or other function)
reference
reference sub-interval (term or other function)
Examples
  • fn:before(fn:or(brown lazy) fox)

    The quick brown fox jumps over the lazy dog

  • fn:before(fn:or(brown lazy) fn:or(dog fox))

    The quick brown fox jumps over the lazy dog

fn:after

Matches intervals from the source that appear after intervals from the reference.

Reference intervals will not be part of the match (this is a filtering function).

Arguments

fn:after(source reference)

source
source sub-interval (term or other function)
reference
reference sub-interval (term or other function)
Examples
  • fn:after(fn:or(brown lazy) fox)

    The quick brown fox jumps over the lazy dog

  • fn:after(fn:or(brown lazy) fn:or(dog fox))

    The quick brown fox jumps over the lazy dog

fn:extend

Matches an interval around another source, extending its span by a number of positions before and after.

This is an advanced function that allows extending the left and right "context" of another interval.

Arguments

fn:extend(source before after)

source
source sub-interval (term or other function)
before
an integer number of positions to extend to the left of the source
after
an integer number of positions to extend to the right of the source
Examples
  • fn:extend(fox 1 2)

    The quick brown fox jumps over the lazy dog

  • fn:extend(fn:or(dog fox) 2 0)

    The quick brown fox jumps over the lazy dog

fn:within

Matches intervals of the source that appear within the provided number of positions from the intervals of the reference.

Arguments

fn:within(source positions reference)

source
source sub-interval (term or other function)
positions
an integer number of maximum positions between source and reference
reference
reference sub-interval (term or other function)
Examples
  • fn:within(fn:or(fox dog) 1 fn:or(quick lazy))

    The quick brown fox jumps over the lazy dog

  • fn:within(fn:or(fox dog) 2 fn:or(quick lazy))

    The quick brown fox jumps over the lazy dog

fn:notWithin

Matches intervals of the source that do not appear within the provided number of positions from the intervals of the reference.

Arguments

fn:notWithin(source positions reference)

source
source sub-interval (term or other function)
positions
an integer number of maximum positions between source and reference
reference
reference sub-interval (term or other function)
Examples
  • fn:notWithin(fn:or(fox dog) 1 fn:or(quick lazy))

    The quick brown fox jumps over the lazy dog

fn:containedBy

Matches intervals of the source that are contained by intervals of the reference.

Arguments

fn:containedBy(source reference)

source
source sub-interval (term or other function)
reference
reference sub-interval (term or other function)
Examples
  • fn:containedBy(fn:or(fox dog) fn:ordered(quick lazy))

    The quick brown fox jumps over the lazy dog

  • fn:containedBy(fn:or(fox dog) fn:extend(lazy 3 3))

    The quick brown fox jumps over the lazy dog

fn:notContainedBy

Matches intervals of the source that are not contained by intervals of the reference.

Arguments

fn:notContainedBy(source reference)

source
source sub-interval (term or other function)
reference
reference sub-interval (term or other function)
Examples
  • fn:notContainedBy(fn:or(fox dog) fn:ordered(quick lazy))

    The quick brown fox jumps over the lazy dog

  • fn:notContainedBy(fn:or(fox dog) fn:extend(lazy 3 3))

    The quick brown fox jumps over the lazy dog

fn:containing

Matches intervals of the source that contain at least one intervals of the reference.

Arguments

fn:containing(source reference)

source
source sub-interval (term or other function)
reference
reference sub-interval (term or other function)
Examples
  • fn:containing(fn:extend(fn:or(lazy brown) 1 1) fn:or(fox dog))

    The quick brown fox jumps over the lazy dog

  • fn:containing(fn:atLeast(2 quick fox dog) jumps)

    The quick brown fox jumps over the lazy dog

fn:notContaining

Matches intervals of the source that do not contain any intervals of the reference.

Arguments

fn:notContaining(source reference)

source
source sub-interval (term or other function)
reference
reference sub-interval (term or other function)
Examples
  • fn:notContaining(fn:extend(fn:or(fox dog) 1 0) fn:or(brown yellow))

    The quick brown fox jumps over the lazy dog

  • fn:notContaining(fn:ordered(fn:or(the The) fn:or(fox dog)) brown)

    The quick brown fox jumps over the lazy dog

fn:overlapping

Matches intervals of the source that overlap with at least one interval of the reference.

Arguments

fn:overlapping(source reference)

source
source sub-interval (term or other function)
reference
reference sub-interval (term or other function)
Examples
  • fn:overlapping(fn:phrase(brown fox) fn:phrase(fox jumps))

    The quick brown fox jumps over the lazy dog

  • fn:overlapping(fn:or(fox dog) fn:extend(lazy 2 2))

    The quick brown fox jumps over the lazy dog

fn:nonOverlapping

Matches intervals of the source that do not overlap with any intervals of the reference.

Arguments

fn:nonOverlapping(source reference)

source
source sub-interval (term or other function)
reference
reference sub-interval (term or other function)
Examples
  • fn:nonOverlapping(fn:phrase(brown fox) fn:phrase(lazy dog))

    The quick brown fox jumps over the lazy dog

  • fn:nonOverlapping(fn:or(fox dog) fn:extend(lazy 2 2))

    The quick brown fox jumps over the lazy dog