Class SimpleQueryParser

java.lang.Object
org.apache.lucene.util.QueryBuilder
org.apache.lucene.queryparser.simple.SimpleQueryParser

public class SimpleQueryParser extends QueryBuilder
SimpleQueryParser is used to parse human readable query syntax.

The main idea behind this parser is that a person should be able to type whatever they want to represent a query, and this parser will do its best to interpret what to search for no matter how poorly composed the request may be. Tokens are considered to be any of a term, phrase, or subquery for the operations described below. Whitespace including ' ' '\n' '\r' and '\t' and certain operators may be used to delimit tokens ( ) + | " .

Any errors in query syntax will be ignored and the parser will attempt to decipher what it can; however, this may mean odd or unexpected results.

Query Operators

  • '+' specifies AND operation: token1+token2
  • '|' specifies OR operation: token1|token2
  • '-' negates a single token: -token0
  • '"' creates phrases of terms: "term1 term2 ..."
  • '*' at the end of terms specifies prefix query: term*
  • '~N' at the end of terms specifies fuzzy query: term~1
  • '~N' at the end of phrases specifies near query: "term1 term2"~5
  • '(' and ')' specifies precedence: token1 + (token2 | token3)

The default operator is OR if no other operator is specified. For example, the following will OR token1 and token2 together: token1 token2

Normal operator precedence will be simple order from right to left. For example, the following will evaluate token1 OR token2 first, then AND with token3:

token1 | token2 + token3
Escaping

An individual term may contain any possible character with certain characters requiring escaping using a '\'. The following characters will need to be escaped in terms and phrases: + | " ( ) ' \

The '-' operator is a special case. On individual terms (not phrases) the first character of a term that is - must be escaped; however, any '-' characters beyond the first character do not need to be escaped. For example:

  • -term1 -- Specifies NOT operation against term1
  • \-term1 -- Searches for the term -term1.
  • term-1 -- Searches for the term term-1.
  • term\-1 -- Searches for the term term-1.

The '*' operator is a special case. On individual terms (not phrases) the last character of a term that is '*' must be escaped; however, any '*' characters before the last character do not need to be escaped:

  • term1* -- Searches for the prefix term1
  • term1\* -- Searches for the term term1*
  • term*1 -- Searches for the term term*1
  • term\*1 -- Searches for the term term*1

Note that above examples consider the terms before text processing.

  • Field Details

    • weights

      protected final Map<String,Float> weights
      Map of fields to query against with their weights
    • flags

      protected final int flags
      flags to the parser (to turn features on/off)
    • AND_OPERATOR

      public static final int AND_OPERATOR
      Enables AND operator (+)
      See Also:
    • NOT_OPERATOR

      public static final int NOT_OPERATOR
      Enables NOT operator (-)
      See Also:
    • OR_OPERATOR

      public static final int OR_OPERATOR
      Enables OR operator (|)
      See Also:
    • PREFIX_OPERATOR

      public static final int PREFIX_OPERATOR
      Enables PREFIX operator (*)
      See Also:
    • PHRASE_OPERATOR

      public static final int PHRASE_OPERATOR
      Enables PHRASE operator (")
      See Also:
    • PRECEDENCE_OPERATORS

      public static final int PRECEDENCE_OPERATORS
      Enables PRECEDENCE operators: ( and )
      See Also:
    • ESCAPE_OPERATOR

      public static final int ESCAPE_OPERATOR
      Enables ESCAPE operator (\)
      See Also:
    • WHITESPACE_OPERATOR

      public static final int WHITESPACE_OPERATOR
      Enables WHITESPACE operators: ' ' '\n' '\r' '\t'
      See Also:
    • FUZZY_OPERATOR

      public static final int FUZZY_OPERATOR
      Enables FUZZY operators: (~) on single terms
      See Also:
    • NEAR_OPERATOR

      public static final int NEAR_OPERATOR
      Enables NEAR operators: (~) on phrases
      See Also:
  • Constructor Details

    • SimpleQueryParser

      public SimpleQueryParser(Analyzer analyzer, String field)
      Creates a new parser searching over a single field.
    • SimpleQueryParser

      public SimpleQueryParser(Analyzer analyzer, Map<String,Float> weights)
      Creates a new parser searching over multiple fields with different weights.
    • SimpleQueryParser

      public SimpleQueryParser(Analyzer analyzer, Map<String,Float> weights, int flags)
      Creates a new parser with custom flags used to enable/disable certain features.
  • Method Details

    • parse

      public Query parse(String queryText)
      Parses the query text and returns parsed query
    • newDefaultQuery

      protected Query newDefaultQuery(String text)
      Factory method to generate a standard query (no phrase or prefix operators).
    • newFuzzyQuery

      protected Query newFuzzyQuery(String text, int fuzziness)
      Factory method to generate a fuzzy query.
    • newPhraseQuery

      protected Query newPhraseQuery(String text, int slop)
      Factory method to generate a phrase query with slop.
    • newPrefixQuery

      protected Query newPrefixQuery(String text)
      Factory method to generate a prefix query.
    • simplify

      protected Query simplify(BooleanQuery bq)
      Helper to simplify boolean queries with 0 or 1 clause
    • getDefaultOperator

      public BooleanClause.Occur getDefaultOperator()
      Returns the implicit operator setting, which will be either SHOULD or MUST.
    • setDefaultOperator

      public void setDefaultOperator(BooleanClause.Occur operator)
      Sets the implicit operator setting, which must be either SHOULD or MUST.