Class SimpleQueryParser


  • public class SimpleQueryParser
    extends QueryBuilder
    SimpleQueryParser is used to parse human readable query syntax.

    The main idea behind this parser is that a person should be able to type whatever they want to represent a query, and this parser will do its best to interpret what to search for no matter how poorly composed the request may be. Tokens are considered to be any of a term, phrase, or subquery for the operations described below. Whitespace including ' ' '\n' '\r' and '\t' and certain operators may be used to delimit tokens ( ) + | " .

    Any errors in query syntax will be ignored and the parser will attempt to decipher what it can; however, this may mean odd or unexpected results.

    Query Operators

    • '+' specifies AND operation: token1+token2
    • '|' specifies OR operation: token1|token2
    • '-' negates a single token: -token0
    • '"' creates phrases of terms: "term1 term2 ..."
    • '*' at the end of terms specifies prefix query: term*
    • '~N' at the end of terms specifies fuzzy query: term~1
    • '~N' at the end of phrases specifies near query: "term1 term2"~5
    • '(' and ')' specifies precedence: token1 + (token2 | token3)

    The default operator is OR if no other operator is specified. For example, the following will OR token1 and token2 together: token1 token2

    Normal operator precedence will be simple order from right to left. For example, the following will evaluate token1 OR token2 first, then AND with token3:

    token1 | token2 + token3
    Escaping

    An individual term may contain any possible character with certain characters requiring escaping using a '\'. The following characters will need to be escaped in terms and phrases: + | " ( ) ' \

    The '-' operator is a special case. On individual terms (not phrases) the first character of a term that is - must be escaped; however, any '-' characters beyond the first character do not need to be escaped. For example:

    • -term1 -- Specifies NOT operation against term1
    • \-term1 -- Searches for the term -term1.
    • term-1 -- Searches for the term term-1.
    • term\-1 -- Searches for the term term-1.

    The '*' operator is a special case. On individual terms (not phrases) the last character of a term that is '*' must be escaped; however, any '*' characters before the last character do not need to be escaped:

    • term1* -- Searches for the prefix term1
    • term1\* -- Searches for the term term1*
    • term*1 -- Searches for the term term*1
    • term\*1 -- Searches for the term term*1

    Note that above examples consider the terms before text processing.

    • Field Detail

      • weights

        protected final Map<String,​Float> weights
        Map of fields to query against with their weights
      • flags

        protected final int flags
        flags to the parser (to turn features on/off)
      • AND_OPERATOR

        public static final int AND_OPERATOR
        Enables AND operator (+)
        See Also:
        Constant Field Values
      • NOT_OPERATOR

        public static final int NOT_OPERATOR
        Enables NOT operator (-)
        See Also:
        Constant Field Values
      • OR_OPERATOR

        public static final int OR_OPERATOR
        Enables OR operator (|)
        See Also:
        Constant Field Values
      • PREFIX_OPERATOR

        public static final int PREFIX_OPERATOR
        Enables PREFIX operator (*)
        See Also:
        Constant Field Values
      • PHRASE_OPERATOR

        public static final int PHRASE_OPERATOR
        Enables PHRASE operator (")
        See Also:
        Constant Field Values
      • PRECEDENCE_OPERATORS

        public static final int PRECEDENCE_OPERATORS
        Enables PRECEDENCE operators: ( and )
        See Also:
        Constant Field Values
      • ESCAPE_OPERATOR

        public static final int ESCAPE_OPERATOR
        Enables ESCAPE operator (\)
        See Also:
        Constant Field Values
      • WHITESPACE_OPERATOR

        public static final int WHITESPACE_OPERATOR
        Enables WHITESPACE operators: ' ' '\n' '\r' '\t'
        See Also:
        Constant Field Values
      • FUZZY_OPERATOR

        public static final int FUZZY_OPERATOR
        Enables FUZZY operators: (~) on single terms
        See Also:
        Constant Field Values
      • NEAR_OPERATOR

        public static final int NEAR_OPERATOR
        Enables NEAR operators: (~) on phrases
        See Also:
        Constant Field Values
    • Constructor Detail

      • SimpleQueryParser

        public SimpleQueryParser​(Analyzer analyzer,
                                 String field)
        Creates a new parser searching over a single field.
      • SimpleQueryParser

        public SimpleQueryParser​(Analyzer analyzer,
                                 Map<String,​Float> weights)
        Creates a new parser searching over multiple fields with different weights.
      • SimpleQueryParser

        public SimpleQueryParser​(Analyzer analyzer,
                                 Map<String,​Float> weights,
                                 int flags)
        Creates a new parser with custom flags used to enable/disable certain features.
    • Method Detail

      • parse

        public Query parse​(String queryText)
        Parses the query text and returns parsed query
      • newDefaultQuery

        protected Query newDefaultQuery​(String text)
        Factory method to generate a standard query (no phrase or prefix operators).
      • newFuzzyQuery

        protected Query newFuzzyQuery​(String text,
                                      int fuzziness)
        Factory method to generate a fuzzy query.
      • newPhraseQuery

        protected Query newPhraseQuery​(String text,
                                       int slop)
        Factory method to generate a phrase query with slop.
      • newPrefixQuery

        protected Query newPrefixQuery​(String text)
        Factory method to generate a prefix query.
      • simplify

        protected Query simplify​(BooleanQuery bq)
        Helper to simplify boolean queries with 0 or 1 clause
      • getDefaultOperator

        public BooleanClause.Occur getDefaultOperator()
        Returns the implicit operator setting, which will be either SHOULD or MUST.
      • setDefaultOperator

        public void setDefaultOperator​(BooleanClause.Occur operator)
        Sets the implicit operator setting, which must be either SHOULD or MUST.