Class Intervals


  • public final class Intervals
    extends Object
    Factory functions for creating interval sources.

    These sources implement minimum-interval algorithms taken from the paper Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics

    Note: by default, sources that are sensitive to internal gaps (e.g. PHRASE and MAXGAPS) will rewrite their sub-sources so that disjunctions of different lengths are pulled up to the top of the interval tree. For example, PHRASE(or(PHRASE("a", "b", "c"), "b"), "c") will automatically rewrite itself to OR(PHRASE("a", "b", "c", "c"), PHRASE("b", "c")) to ensure that documents containing "b c" are matched. This can lead to less efficient queries, as more terms need to be loaded (for example, the "c" iterator above is loaded twice), so if you care more about speed than about accuracy you can use the or(boolean, IntervalsSource...) factory method to prevent rewriting.

    • Method Detail

      • or

        public static IntervalsSource or​(IntervalsSource... subSources)
        Return an IntervalsSource over the disjunction of a set of sub-sources

        Automatically rewrites if wrapped by an interval source that is sensitive to internal gaps

      • or

        public static IntervalsSource or​(boolean rewrite,
                                         IntervalsSource... subSources)
        Return an IntervalsSource over the disjunction of a set of sub-sources
        Parameters:
        rewrite - if false, do not rewrite intervals that are sensitive to internal gaps; this may run more efficiently, but can miss valid hits due to minimization
        subSources - the sources to combine
      • or

        public static IntervalsSource or​(boolean rewrite,
                                         List<IntervalsSource> subSources)
        Return an IntervalsSource over the disjunction of a set of sub-sources
        Parameters:
        rewrite - if false, do not rewrite intervals that are sensitive to internal gaps; this may run more efficiently, but can miss valid hits due to minimization
        subSources - the sources to combine
      • prefix

        public static IntervalsSource prefix​(BytesRef prefix,
                                             int maxExpansions)
        Expert: Return an IntervalsSource over the disjunction of all terms that begin with a prefix

        WARNING: Setting maxExpansions to higher than the default value of DEFAULT_MAX_EXPANSIONS can be both slow and memory-intensive

        Parameters:
        prefix - the prefix to expand
        maxExpansions - the maximum number of terms to expand to
        Throws:
        IllegalStateException - if the prefix expands to more than maxExpansions terms
      • wildcard

        public static IntervalsSource wildcard​(BytesRef wildcard,
                                               int maxExpansions)
        Expert: Return an IntervalsSource over the disjunction of all terms that match a wildcard glob

        WARNING: Setting maxExpansions to higher than the default value of DEFAULT_MAX_EXPANSIONS can be both slow and memory-intensive

        Parameters:
        wildcard - the glob to expand
        maxExpansions - the maximum number of terms to expand to
        Throws:
        IllegalStateException - if the wildcard glob expands to more than maxExpansions terms
        See Also:
        for glob format
      • fuzzyTerm

        public static IntervalsSource fuzzyTerm​(String term,
                                                int maxEdits,
                                                int prefixLength,
                                                boolean transpositions,
                                                int maxExpansions)
        A fuzzy term IntervalsSource matches the disjunction of intervals of terms that are within the specified maxEdits from the provided term.

        The implementation is delegated to a multiterm(CompiledAutomaton, int, String) interval source, with an automaton sourced from FuzzyQuery.

        Parameters:
        term - the term to search for
        maxEdits - must be >= 0 and <= LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE, use FuzzyQuery.defaultMaxEdits for the default, if needed.
        prefixLength - length of common (non-fuzzy) prefix
        maxExpansions - the maximum number of terms to match. Setting maxExpansions to higher than the default value of DEFAULT_MAX_EXPANSIONS can be both slow and memory-intensive
        transpositions - true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.
      • multiterm

        public static IntervalsSource multiterm​(CompiledAutomaton ca,
                                                int maxExpansions,
                                                String pattern)
        Expert: Return an IntervalsSource over the disjunction of all terms that are accepted by the given automaton

        WARNING: Setting maxExpansions to higher than the default value of DEFAULT_MAX_EXPANSIONS can be both slow and memory-intensive

        Parameters:
        ca - an automaton accepting matching terms
        maxExpansions - the maximum number of terms to expand to
        pattern - string representation of the given automaton, mostly used in exception messages
        Throws:
        IllegalStateException - if the automaton accepts more than maxExpansions terms
      • maxwidth

        public static IntervalsSource maxwidth​(int width,
                                               IntervalsSource subSource)
        Create an IntervalsSource that filters a sub-source by the width of its intervals
        Parameters:
        width - the maximum width of intervals in the sub-source to filter
        subSource - the sub-source to filter
      • maxgaps

        public static IntervalsSource maxgaps​(int gaps,
                                              IntervalsSource subSource)
        Create an IntervalsSource that filters a sub-source by its gaps
        Parameters:
        gaps - the maximum number of gaps in the sub-source to filter
        subSource - the sub-source to filter
      • extend

        public static IntervalsSource extend​(IntervalsSource source,
                                             int before,
                                             int after)
        Create an IntervalsSource that wraps another source, extending its intervals by a number of positions before and after.

        This can be useful for adding defined gaps in a block query; for example, to find 'a b [2 arbitrary terms] c', you can call:

           Intervals.phrase(Intervals.term("a"), Intervals.extend(Intervals.term("b"), 0, 2), Intervals.term("c"));
         
        Note that calling IntervalIterator.gaps() on iterators returned by this source delegates directly to the wrapped iterator, and does not include the extensions.
        Parameters:
        source - the source to extend
        before - how many positions to extend before the delegated interval
        after - how many positions to extend after the delegated interval
      • unordered

        public static IntervalsSource unordered​(IntervalsSource... subSources)
        Create an unordered IntervalsSource. Note that if there are multiple intervals ends at the same position are eligible, only the narrowest one will be returned. For example if asking for unordered(term("apple"), term("banana")) on field of "apple wolf apple orange banana", only the "apple orange banana" will be returned.

        Returns intervals in which all the subsources appear. The subsources may overlap

        Parameters:
        subSources - an unordered set of IntervalsSources
      • fixField

        public static IntervalsSource fixField​(String field,
                                               IntervalsSource source)
        Create an IntervalsSource that always returns intervals from a specific field

        This is useful for comparing intervals across multiple fields, for example fields that have been analyzed differently, allowing you to search for stemmed terms near unstemmed terms, etc.

      • overlapping

        public static IntervalsSource overlapping​(IntervalsSource source,
                                                  IntervalsSource reference)
        Returns intervals from a source that overlap with intervals from another source
        Parameters:
        source - the source to filter
        reference - the source to filter by
      • notWithin

        public static IntervalsSource notWithin​(IntervalsSource minuend,
                                                int positions,
                                                IntervalsSource subtrahend)
        Create a not-within IntervalsSource

        Returns intervals of the minuend that do not appear within a set number of positions of intervals from the subtrahend query

        Parameters:
        minuend - the IntervalsSource to filter
        positions - the minimum distance that intervals from the minuend may occur from intervals of the subtrahend
        subtrahend - the IntervalsSource to filter by
      • within

        public static IntervalsSource within​(IntervalsSource source,
                                             int positions,
                                             IntervalsSource reference)
        Returns intervals of the source that appear within a set number of positions of intervals from the reference
        Parameters:
        source - the IntervalsSource to filter
        positions - the maximum distance that intervals of the source may occur from intervals of the reference
        reference - the IntervalsSource to filter by
      • atLeast

        public static IntervalsSource atLeast​(int minShouldMatch,
                                              IntervalsSource... sources)
        Return intervals that span combinations of intervals from minShouldMatch of the sources
      • analyzedText

        public static IntervalsSource analyzedText​(String text,
                                                   Analyzer analyzer,
                                                   String field,
                                                   int maxGaps,
                                                   boolean ordered)
                                            throws IOException
        Returns intervals that correspond to tokens from a TokenStream returned for text by applying the provided Analyzer as if text was the content of the given field. The intervals can be ordered or unordered and can have optional gaps inside.
        Parameters:
        text - The text to analyze.
        analyzer - The Analyzer to use to acquire a TokenStream which is then converted into intervals.
        field - The field text should be parsed as.
        maxGaps - Maximum number of allowed gaps between sub-intervals resulting from tokens.
        ordered - Whether sub-intervals should enforce token ordering or not.
        Returns:
        Returns an IntervalsSource that matches tokens acquired from analysis of text. Possibly an empty interval source, never null.
        Throws:
        IOException - If an I/O exception occurs.
      • analyzedText

        public static IntervalsSource analyzedText​(TokenStream tokenStream,
                                                   int maxGaps,
                                                   boolean ordered)
                                            throws IOException
        Returns intervals that correspond to tokens from the provided TokenStream. This is a low-level counterpart to analyzedText(String, Analyzer, String, int, boolean). The intervals can be ordered or unordered and can have optional gaps inside.
        Parameters:
        tokenStream - The token stream to produce intervals for. The token stream may be fully or partially consumed after returning from this method.
        maxGaps - Maximum number of allowed gaps between sub-intervals resulting from tokens.
        ordered - Whether sub-intervals should enforce token ordering or not.
        Returns:
        Returns an IntervalsSource that matches tokens acquired from analysis of text. Possibly an empty interval source, never null.
        Throws:
        IOException - If an I/O exception occurs.