Class StandardQueryParser

  • All Implemented Interfaces:
    CommonQueryParserConfiguration
    Direct Known Subclasses:
    PrecedenceQueryParser

    public class StandardQueryParser
    extends QueryParserHelper
    implements CommonQueryParserConfiguration
    The StandardQueryParser is a pre-assembled query parser that supports most features of the classic Lucene query parser, allows dynamic configuration of some of its features (like multi-field expansion or wildcard query restrictions) and adds support for new query types and expressions.

    The StandardSyntaxParser is an extension of the QueryParserHelper with reasonable defaults for syntax tree parsing (StandardSyntaxParser, node processor pipeline (StandardQueryNodeProcessorPipeline and node tree to Query builder (StandardQueryTreeBuilder).

    Typical usage, including configuration tweaks:

    
     StandardQueryParser qpHelper = new StandardQueryParser();
     StandardQueryConfigHandler config =  qpHelper.getQueryConfigHandler();
     config.setAllowLeadingWildcard(true);
     config.setAnalyzer(new WhitespaceAnalyzer());
     Query query = qpHelper.parse("apache AND lucene", "defaultField");
     

    Supported query syntax

    Standard query parser borrows most of its syntax from the classic query parser but adds more features and expressions on top of that syntax.

    A query consists of clauses, field specifications, grouping and Boolean operators and interval functions. We will discuss them in order.

    Basic clauses

    A query must contain one or more clauses. A clause can be a literal term, a phrase, a wildcard expression or other expression that

    The following are some examples of simple one-clause queries:

    • test

      selects documents containing the word test (term clause).

    • "test equipment"

      phrase search; selects documents containing the phrase test equipment (phrase clause).

    • "test failure"~4

      proximity search; selects documents containing the words test and failure within 4 words (positions) from each other. The provided "proximity" is technically translated into "edit distance" (maximum number of atomic word-moving operations required to transform the document's phrase into the query phrase).

    • tes*

      prefix wildcard matching; selects documents containing words starting with tes, such as: test, testing or testable.

    • /.est(s|ing)/

      documents containing words matching the provided regular expression, such as resting or nests.

    • nest~2

      fuzzy term matching; documents containing words within 2-edits distance (2 additions, removals or replacements of a letter) from nest, such as test, net or rests.

    Field specifications

    Most clauses can be prefixed by a field name and a colon: the clause will then apply to that field only. If the field specification is omitted, the query parser will expand the clause over all fields specified by a call to setMultiFields(CharSequence[]) or will use the default field provided in the call to parse(String, String).

    The following are some examples of field-prefixed clauses:

    • title:test

      documents containing test in the title field.

    • title:(die OR hard)

      documents containing die or hard in the title field.

    Boolean operators and grouping

    You can combine clauses using Boolean AND, OR and NOT operators to form more complex expressions, for example:

    • test AND results

      selects documents containing both the word test and the word results.

    • test OR suite OR results

      selects documents with at least one of test, suite or results.

    • title:test AND NOT title:complete

      selects documents containing test and not containing complete in the title field.

    • title:test AND (pass* OR fail*)

      grouping; use parentheses to specify the precedence of terms in a Boolean clause. Query will match documents containing test in the title field and a word starting with pass or fail in the default search fields.

    • title:(pass fail skip)

      shorthand notation; documents containing at least one of pass, fail or skip in the title field.

    • title:(+test +"result unknown")

      shorthand notation; documents containing both pass and result unknown in the title field.

    Note the Boolean operators must be written in all caps, otherwise they are parsed as regular terms.

    Range operators

    To search for ranges of textual or numeric values, use square or curly brackets, for example:

    • name:[Jones TO Smith]

      inclusive range; selects documents whose name field has any value between Jones and Smith, including boundaries.

    • score:{2.5 TO 7.3}

      exclusive range; selects documents whose score field is between 2.5 and 7.3, excluding boundaries.

    • score:{2.5 TO *]

      one-sided range; selects documents whose score field is larger than 2.5.

    Term boosting

    Terms, quoted terms, term range expressions and grouped clauses can have a floating-point weight boost applied to them to increase their score relative to other clauses. For example:

    • jones^2 OR smith^0.5

      prioritize documents with jones term over matches on the smith term.

    • field:(a OR b NOT c)^2.5 OR field:d

      apply the boost to a sub-query.

    Special character escaping

    Most search terms can be put in double quotes making special-character escaping not necessary. If the search term contains the quote character (or cannot be quoted for some reason), any character can be quoted with a backslash. For example:

    • \:\(quoted\+term\)\:

      a single search term (quoted+term): with escape sequences. An alternative quoted form would be simpler: ":(quoted+term):" .

    Minimum-should-match constraint for Boolean disjunction groups

    A minimum-should-match operator can be applied to a disjunction Boolean query (a query with only "OR"-subclauses) and forces the query to match documents with at least the provided number of these subclauses. For example:

    • (blue crab fish)@2

      matches all documents with at least two terms from the set [blue, crab, fish] (in any order).

    • ((yellow OR blue) crab fish)@2

      sub-clauses of a Boolean query can themselves be complex queries; here the min-should-match selects documents that match at least two of the provided three sub-clauses.

    Interval function clauses

    Interval functions are a powerful tool to express search needs in terms of one or more * contiguous fragments of text and their relationship to one another. All interval clauses start with the fn: prefix (possibly prefixed by a field specification). For example:

    • fn:ordered(quick brown fox)

      matches all documents (in the default field or in multi-field expansion) with at least one ordered sequence of quick, brown and fox terms.

    • title:fn:maxwidth(5 fn:atLeast(2 quick brown fox))

      matches all documents in the title field where at least two of the three terms (quick, brown and fox) occur within five positions of each other.

    Please refer to the interval functions package for more information on which functions are available and how they work.
    See Also:
    StandardQueryParser, StandardQueryConfigHandler, StandardSyntaxParser, StandardQueryNodeProcessorPipeline, StandardQueryTreeBuilder