Class Intervals


  • public final class Intervals
    extends Object
    Constructor functions for IntervalsSource types

    These sources implement minimum-interval algorithms taken from the paper Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics

    By default, sources that are sensitive to internal gaps (e.g. PHRASE and MAXGAPS) will rewrite their sub-sources so that disjunctions of different lengths are pulled up to the top of the interval tree. For example, PHRASE(or(PHRASE("a", "b", "c"), "b"), "c") will automatically rewrite itself to OR(PHRASE("a", "b", "c", "c"), PHRASE("b", "c")) to ensure that documents containing "b c" are matched. This can lead to less efficient queries, as more terms need to be loaded (for example, the "c" iterator above is loaded twice), so if you care more about speed than about accuracy you can use the or(boolean, IntervalsSource...) factory method to prevent rewriting.

    • Method Detail

      • or

        public static IntervalsSource or​(IntervalsSource... subSources)
        Return an IntervalsSource over the disjunction of a set of sub-sources

        Automatically rewrites if wrapped by an interval source that is sensitive to internal gaps

      • or

        public static IntervalsSource or​(boolean rewrite,
                                         IntervalsSource... subSources)
        Return an IntervalsSource over the disjunction of a set of sub-sources
        Parameters:
        rewrite - if false, do not rewrite intervals that are sensitive to internal gaps; this may run more efficiently, but can miss valid hits due to minimization
        subSources - the sources to combine
      • or

        public static IntervalsSource or​(boolean rewrite,
                                         List<IntervalsSource> subSources)
        Return an IntervalsSource over the disjunction of a set of sub-sources
        Parameters:
        rewrite - if false, do not rewrite intervals that are sensitive to internal gaps; this may run more efficiently, but can miss valid hits due to minimization
        subSources - the sources to combine
      • prefix

        public static IntervalsSource prefix​(BytesRef prefix,
                                             int maxExpansions)
        Expert: Return an IntervalsSource over the disjunction of all terms that begin with a prefix

        WARNING: Setting maxExpansions to higher than the default value of 128 can be both slow and memory-intensive

        Parameters:
        prefix - the prefix to expand
        maxExpansions - the maximum number of terms to expand to
        Throws:
        IllegalStateException - if the prefix expands to more than maxExpansions terms
      • wildcard

        public static IntervalsSource wildcard​(BytesRef wildcard,
                                               int maxExpansions)
        Expert: Return an IntervalsSource over the disjunction of all terms that match a wildcard glob

        WARNING: Setting maxExpansions to higher than the default value of 128 can be both slow and memory-intensive

        Parameters:
        wildcard - the glob to expand
        maxExpansions - the maximum number of terms to expand to
        Throws:
        IllegalStateException - if the wildcard glob expands to more than maxExpansions terms
        See Also:
        for glob format
      • multiterm

        public static IntervalsSource multiterm​(CompiledAutomaton ca,
                                                String pattern)
        Expert: Return an IntervalsSource over the disjunction of all terms that are accepted by the given automaton
        Parameters:
        ca - an automaton accepting matching terms
        pattern - string representation of the given automaton, mostly used in exception messages
        Throws:
        IllegalStateException - if the automaton accepts more than 128 terms
      • multiterm

        public static IntervalsSource multiterm​(CompiledAutomaton ca,
                                                int maxExpansions,
                                                String pattern)
        Expert: Return an IntervalsSource over the disjunction of all terms that are accepted by the given automaton

        WARNING: Setting maxExpansions to higher than the default value of 128 can be both slow and memory-intensive

        Parameters:
        ca - an automaton accepting matching terms
        maxExpansions - the maximum number of terms to expand to
        pattern - string representation of the given automaton, mostly used in exception messages
        Throws:
        IllegalStateException - if the automaton accepts more than maxExpansions terms
      • maxwidth

        public static IntervalsSource maxwidth​(int width,
                                               IntervalsSource subSource)
        Create an IntervalsSource that filters a sub-source by the width of its intervals
        Parameters:
        width - the maximum width of intervals in the sub-source to filter
        subSource - the sub-source to filter
      • maxgaps

        public static IntervalsSource maxgaps​(int gaps,
                                              IntervalsSource subSource)
        Create an IntervalsSource that filters a sub-source by its gaps
        Parameters:
        gaps - the maximum number of gaps in the sub-source to filter
        subSource - the sub-source to filter
      • extend

        public static IntervalsSource extend​(IntervalsSource source,
                                             int before,
                                             int after)
        Create an IntervalsSource that wraps another source, extending its intervals by a number of positions before and after.

        This can be useful for adding defined gaps in a block query; for example, to find 'a b [2 arbitrary terms] c', you can call:

           Intervals.phrase(Intervals.term("a"), Intervals.extend(Intervals.term("b"), 0, 2), Intervals.term("c"));
         
        Note that calling IntervalIterator.gaps() on iterators returned by this source delegates directly to the wrapped iterator, and does not include the extensions.
        Parameters:
        source - the source to extend
        before - how many positions to extend before the delegated interval
        after - how many positions to extend after the delegated interval
      • fixField

        public static IntervalsSource fixField​(String field,
                                               IntervalsSource source)
        Create an IntervalsSource that always returns intervals from a specific field

        This is useful for comparing intervals across multiple fields, for example fields that have been analyzed differently, allowing you to search for stemmed terms near unstemmed terms, etc.

      • overlapping

        public static IntervalsSource overlapping​(IntervalsSource source,
                                                  IntervalsSource reference)
        Returns intervals from a source that overlap with intervals from another source
        Parameters:
        source - the source to filter
        reference - the source to filter by
      • notWithin

        public static IntervalsSource notWithin​(IntervalsSource minuend,
                                                int positions,
                                                IntervalsSource subtrahend)
        Create a not-within IntervalsSource

        Returns intervals of the minuend that do not appear within a set number of positions of intervals from the subtrahend query

        Parameters:
        minuend - the IntervalsSource to filter
        positions - the minimum distance that intervals from the minuend may occur from intervals of the subtrahend
        subtrahend - the IntervalsSource to filter by
      • within

        public static IntervalsSource within​(IntervalsSource source,
                                             int positions,
                                             IntervalsSource reference)
        Returns intervals of the source that appear within a set number of positions of intervals from the reference
        Parameters:
        source - the IntervalsSource to filter
        positions - the maximum distance that intervals of the source may occur from intervals of the reference
        reference - the IntervalsSource to filter by
      • atLeast

        public static IntervalsSource atLeast​(int minShouldMatch,
                                              IntervalsSource... sources)
        Return intervals that span combinations of intervals from minShouldMatch of the sources