Class Intervals

java.lang.Object
org.apache.lucene.queries.intervals.Intervals

public final class Intervals extends Object
Factory functions for creating interval sources.

These sources implement minimum-interval algorithms taken from the paper Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics

Note: by default, sources that are sensitive to internal gaps (e.g. PHRASE and MAXGAPS) will rewrite their sub-sources so that disjunctions of different lengths are pulled up to the top of the interval tree. For example, PHRASE(or(PHRASE("a", "b", "c"), "b"), "c") will automatically rewrite itself to OR(PHRASE("a", "b", "c", "c"), PHRASE("b", "c")) to ensure that documents containing "b c" are matched. This can lead to less efficient queries, as more terms need to be loaded (for example, the "c" iterator above is loaded twice), so if you care more about speed than about accuracy you can use the or(boolean, IntervalsSource...) factory method to prevent rewriting.

  • Field Details

  • Method Details

    • term

      public static IntervalsSource term(BytesRef term)
      Return an IntervalsSource exposing intervals for a term
    • term

      public static IntervalsSource term(String term)
      Return an IntervalsSource exposing intervals for a term
    • term

      public static IntervalsSource term(String term, Predicate<BytesRef> payloadFilter)
      Return an IntervalsSource exposing intervals for a term, filtered by the value of the term's payload at each position
    • term

      public static IntervalsSource term(BytesRef term, Predicate<BytesRef> payloadFilter)
      Return an IntervalsSource exposing intervals for a term, filtered by the value of the term's payload at each position
    • phrase

      public static IntervalsSource phrase(String... terms)
      Return an IntervalsSource exposing intervals for a phrase consisting of a list of terms
    • phrase

      public static IntervalsSource phrase(IntervalsSource... subSources)
      Return an IntervalsSource exposing intervals for a phrase consisting of a list of interval sources
    • or

      public static IntervalsSource or(IntervalsSource... subSources)
      Return an IntervalsSource over the disjunction of a set of sub-sources

      Automatically rewrites if wrapped by an interval source that is sensitive to internal gaps

    • or

      public static IntervalsSource or(boolean rewrite, IntervalsSource... subSources)
      Return an IntervalsSource over the disjunction of a set of sub-sources
      Parameters:
      rewrite - if false, do not rewrite intervals that are sensitive to internal gaps; this may run more efficiently, but can miss valid hits due to minimization
      subSources - the sources to combine
    • or

      public static IntervalsSource or(List<IntervalsSource> subSources)
      Return an IntervalsSource over the disjunction of a set of sub-sources
    • or

      public static IntervalsSource or(boolean rewrite, List<IntervalsSource> subSources)
      Return an IntervalsSource over the disjunction of a set of sub-sources
      Parameters:
      rewrite - if false, do not rewrite intervals that are sensitive to internal gaps; this may run more efficiently, but can miss valid hits due to minimization
      subSources - the sources to combine
    • prefix

      public static IntervalsSource prefix(BytesRef prefix)
      Return an IntervalsSource over the disjunction of all terms that begin with a prefix
      Throws:
      IllegalStateException - if the prefix expands to more than DEFAULT_MAX_EXPANSIONS terms
    • prefix

      public static IntervalsSource prefix(BytesRef prefix, int maxExpansions)
      Expert: Return an IntervalsSource over the disjunction of all terms that begin with a prefix

      WARNING: Setting maxExpansions to higher than the default value of DEFAULT_MAX_EXPANSIONS can be both slow and memory-intensive

      Parameters:
      prefix - the prefix to expand
      maxExpansions - the maximum number of terms to expand to
      Throws:
      IllegalStateException - if the prefix expands to more than maxExpansions terms
    • wildcard

      public static IntervalsSource wildcard(BytesRef wildcard)
      Return an IntervalsSource over the disjunction of all terms that match a wildcard glob
      Throws:
      IllegalStateException - if the wildcard glob expands to more than DEFAULT_MAX_EXPANSIONS terms
      See Also:
    • wildcard

      public static IntervalsSource wildcard(BytesRef wildcard, int maxExpansions)
      Expert: Return an IntervalsSource over the disjunction of all terms that match a wildcard glob

      WARNING: Setting maxExpansions to higher than the default value of DEFAULT_MAX_EXPANSIONS can be both slow and memory-intensive

      Parameters:
      wildcard - the glob to expand
      maxExpansions - the maximum number of terms to expand to
      Throws:
      IllegalStateException - if the wildcard glob expands to more than maxExpansions terms
      See Also:
    • fuzzyTerm

      public static IntervalsSource fuzzyTerm(String term, int maxEdits)
      A fuzzy term IntervalsSource matches the disjunction of intervals of terms that are within the specified maxEdits from the provided term.
      Parameters:
      term - the term to search for
      maxEdits - must be >= 0 and <= LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE, use FuzzyQuery.defaultMaxEdits for the default, if needed.
      See Also:
    • fuzzyTerm

      public static IntervalsSource fuzzyTerm(String term, int maxEdits, int prefixLength, boolean transpositions, int maxExpansions)
      A fuzzy term IntervalsSource matches the disjunction of intervals of terms that are within the specified maxEdits from the provided term.

      The implementation is delegated to a multiterm(CompiledAutomaton, int, String) interval source, with an automaton sourced from FuzzyQuery.

      Parameters:
      term - the term to search for
      maxEdits - must be >= 0 and <= LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE, use FuzzyQuery.defaultMaxEdits for the default, if needed.
      prefixLength - length of common (non-fuzzy) prefix
      maxExpansions - the maximum number of terms to match. Setting maxExpansions to higher than the default value of DEFAULT_MAX_EXPANSIONS can be both slow and memory-intensive
      transpositions - true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.
    • multiterm

      public static IntervalsSource multiterm(CompiledAutomaton ca, String pattern)
      Expert: Return an IntervalsSource over the disjunction of all terms that are accepted by the given automaton
      Parameters:
      ca - an automaton accepting matching terms
      pattern - string representation of the given automaton, mostly used in exception messages
      Throws:
      IllegalStateException - if the automaton accepts more than DEFAULT_MAX_EXPANSIONS terms
    • multiterm

      public static IntervalsSource multiterm(CompiledAutomaton ca, int maxExpansions, String pattern)
      Expert: Return an IntervalsSource over the disjunction of all terms that are accepted by the given automaton

      WARNING: Setting maxExpansions to higher than the default value of DEFAULT_MAX_EXPANSIONS can be both slow and memory-intensive

      Parameters:
      ca - an automaton accepting matching terms
      maxExpansions - the maximum number of terms to expand to
      pattern - string representation of the given automaton, mostly used in exception messages
      Throws:
      IllegalStateException - if the automaton accepts more than maxExpansions terms
    • maxwidth

      public static IntervalsSource maxwidth(int width, IntervalsSource subSource)
      Create an IntervalsSource that filters a sub-source by the width of its intervals
      Parameters:
      width - the maximum width of intervals in the sub-source to filter
      subSource - the sub-source to filter
    • maxgaps

      public static IntervalsSource maxgaps(int gaps, IntervalsSource subSource)
      Create an IntervalsSource that filters a sub-source by its gaps
      Parameters:
      gaps - the maximum number of gaps in the sub-source to filter
      subSource - the sub-source to filter
    • extend

      public static IntervalsSource extend(IntervalsSource source, int before, int after)
      Create an IntervalsSource that wraps another source, extending its intervals by a number of positions before and after.

      This can be useful for adding defined gaps in a block query; for example, to find 'a b [2 arbitrary terms] c', you can call:

         Intervals.phrase(Intervals.term("a"), Intervals.extend(Intervals.term("b"), 0, 2), Intervals.term("c"));
       
      Note that calling IntervalIterator.gaps() on iterators returned by this source delegates directly to the wrapped iterator, and does not include the extensions.
      Parameters:
      source - the source to extend
      before - how many positions to extend before the delegated interval
      after - how many positions to extend after the delegated interval
    • ordered

      public static IntervalsSource ordered(IntervalsSource... subSources)
      Create an ordered IntervalsSource

      Returns intervals in which the subsources all appear in the given order

      Parameters:
      subSources - an ordered set of IntervalsSource objects
    • unordered

      public static IntervalsSource unordered(IntervalsSource... subSources)
      Create an unordered IntervalsSource. Note that if there are multiple intervals ends at the same position are eligible, only the narrowest one will be returned. For example if asking for unordered(term("apple"), term("banana")) on field of "apple wolf apple orange banana", only the "apple orange banana" will be returned.

      Returns intervals in which all the subsources appear. The subsources may overlap

      Parameters:
      subSources - an unordered set of IntervalsSources
    • unorderedNoOverlaps

      public static IntervalsSource unorderedNoOverlaps(IntervalsSource a, IntervalsSource b)
      Create an unordered IntervalsSource allowing no overlaps between subsources

      Returns intervals in which both the subsources appear and do not overlap.

    • fixField

      public static IntervalsSource fixField(String field, IntervalsSource source)
      Create an IntervalsSource that always returns intervals from a specific field

      This is useful for comparing intervals across multiple fields, for example fields that have been analyzed differently, allowing you to search for stemmed terms near unstemmed terms, etc.

    • nonOverlapping

      public static IntervalsSource nonOverlapping(IntervalsSource minuend, IntervalsSource subtrahend)
      Create a non-overlapping IntervalsSource

      Returns intervals of the minuend that do not overlap with intervals from the subtrahend

      Parameters:
      minuend - the IntervalsSource to filter
      subtrahend - the IntervalsSource to filter by
    • overlapping

      public static IntervalsSource overlapping(IntervalsSource source, IntervalsSource reference)
      Returns intervals from a source that overlap with intervals from another source
      Parameters:
      source - the source to filter
      reference - the source to filter by
    • notWithin

      public static IntervalsSource notWithin(IntervalsSource minuend, int positions, IntervalsSource subtrahend)
      Create a not-within IntervalsSource

      Returns intervals of the minuend that do not appear within a set number of positions of intervals from the subtrahend query

      Parameters:
      minuend - the IntervalsSource to filter
      positions - the minimum distance that intervals from the minuend may occur from intervals of the subtrahend
      subtrahend - the IntervalsSource to filter by
    • within

      public static IntervalsSource within(IntervalsSource source, int positions, IntervalsSource reference)
      Returns intervals of the source that appear within a set number of positions of intervals from the reference
      Parameters:
      source - the IntervalsSource to filter
      positions - the maximum distance that intervals of the source may occur from intervals of the reference
      reference - the IntervalsSource to filter by
    • notContaining

      public static IntervalsSource notContaining(IntervalsSource minuend, IntervalsSource subtrahend)
      Create a not-containing IntervalsSource

      Returns intervals from the minuend that do not contain intervals of the subtrahend

      Parameters:
      minuend - the IntervalsSource to filter
      subtrahend - the IntervalsSource to filter by
    • containing

      public static IntervalsSource containing(IntervalsSource big, IntervalsSource small)
      Create a containing IntervalsSource

      Returns intervals from the big source that contain one or more intervals from the small source

      Parameters:
      big - the IntervalsSource to filter
      small - the IntervalsSource to filter by
    • notContainedBy

      public static IntervalsSource notContainedBy(IntervalsSource small, IntervalsSource big)
      Create a not-contained-by IntervalsSource

      Returns intervals from the small IntervalsSource that do not appear within intervals from the big IntervalsSource.

      Parameters:
      small - the IntervalsSource to filter
      big - the IntervalsSource to filter by
    • containedBy

      public static IntervalsSource containedBy(IntervalsSource small, IntervalsSource big)
      Create a contained-by IntervalsSource

      Returns intervals from the small query that appear within intervals of the big query

      Parameters:
      small - the IntervalsSource to filter
      big - the IntervalsSource to filter by
    • atLeast

      public static IntervalsSource atLeast(int minShouldMatch, IntervalsSource... sources)
      Return intervals that span combinations of intervals from minShouldMatch of the sources
    • before

      public static IntervalsSource before(IntervalsSource source, IntervalsSource reference)
      Returns intervals from the source that appear before intervals from the reference
    • after

      public static IntervalsSource after(IntervalsSource source, IntervalsSource reference)
      Returns intervals from the source that appear after intervals from the reference
    • analyzedText

      public static IntervalsSource analyzedText(String text, Analyzer analyzer, String field, int maxGaps, boolean ordered) throws IOException
      Returns intervals that correspond to tokens from a TokenStream returned for text by applying the provided Analyzer as if text was the content of the given field. The intervals can be ordered or unordered and can have optional gaps inside.
      Parameters:
      text - The text to analyze.
      analyzer - The Analyzer to use to acquire a TokenStream which is then converted into intervals.
      field - The field text should be parsed as.
      maxGaps - Maximum number of allowed gaps between sub-intervals resulting from tokens.
      ordered - Whether sub-intervals should enforce token ordering or not.
      Returns:
      Returns an IntervalsSource that matches tokens acquired from analysis of text. Possibly an empty interval source, never null.
      Throws:
      IOException - If an I/O exception occurs.
    • analyzedText

      public static IntervalsSource analyzedText(TokenStream tokenStream, int maxGaps, boolean ordered) throws IOException
      Returns intervals that correspond to tokens from the provided TokenStream. This is a low-level counterpart to analyzedText(String, Analyzer, String, int, boolean). The intervals can be ordered or unordered and can have optional gaps inside.
      Parameters:
      tokenStream - The token stream to produce intervals for. The token stream may be fully or partially consumed after returning from this method.
      maxGaps - Maximum number of allowed gaps between sub-intervals resulting from tokens.
      ordered - Whether sub-intervals should enforce token ordering or not.
      Returns:
      Returns an IntervalsSource that matches tokens acquired from analysis of text. Possibly an empty interval source, never null.
      Throws:
      IOException - If an I/O exception occurs.