Class StandardQueryParser
- java.lang.Object
-
- org.apache.lucene.queryparser.flexible.core.QueryParserHelper
-
- org.apache.lucene.queryparser.flexible.standard.StandardQueryParser
-
- All Implemented Interfaces:
CommonQueryParserConfiguration
- Direct Known Subclasses:
PrecedenceQueryParser
public class StandardQueryParser extends QueryParserHelper implements CommonQueryParserConfiguration
TheStandardQueryParser
is a pre-assembled query parser that supports most features of the classic Lucene query parser, allows dynamic configuration of some of its features (like multi-field expansion or wildcard query restrictions) and adds support for new query types and expressions.The
StandardSyntaxParser
is an extension of theQueryParserHelper
with reasonable defaults for syntax tree parsing (StandardSyntaxParser
, node processor pipeline (StandardQueryNodeProcessorPipeline
and node tree toQuery
builder (StandardQueryTreeBuilder
).Typical usage, including configuration tweaks:
StandardQueryParser qpHelper = new StandardQueryParser(); StandardQueryConfigHandler config = qpHelper.getQueryConfigHandler(); config.setAllowLeadingWildcard(true); config.setAnalyzer(new WhitespaceAnalyzer()); Query query = qpHelper.parse("apache AND lucene", "defaultField");
Supported query syntax
Standard query parser borrows most of its syntax from the classic query parser but adds more features and expressions on top of that syntax.
A query consists of clauses, field specifications, grouping and Boolean operators and interval functions. We will discuss them in order.
Basic clauses
A query must contain one or more clauses. A clause can be a literal term, a phrase, a wildcard expression or other expression that
The following are some examples of simple one-clause queries:
test
selects documents containing the word test (term clause).
"test equipment"
phrase search; selects documents containing the phrase test equipment (phrase clause).
"test failure"~4
proximity search; selects documents containing the words test and failure within 4 words (positions) from each other. The provided "proximity" is technically translated into "edit distance" (maximum number of atomic word-moving operations required to transform the document's phrase into the query phrase).
tes*
prefix wildcard matching; selects documents containing words starting with tes, such as: test, testing or testable.
/.est(s|ing)/
documents containing words matching the provided regular expression, such as resting or nests.
nest~2
fuzzy term matching; documents containing words within 2-edits distance (2 additions, removals or replacements of a letter) from nest, such as test, net or rests.
Field specifications
Most clauses can be prefixed by a field name and a colon: the clause will then apply to that field only. If the field specification is omitted, the query parser will expand the clause over all fields specified by a call to
setMultiFields(CharSequence[])
or will use the default field provided in the call toparse(String, String)
.The following are some examples of field-prefixed clauses:
title:test
documents containing test in the
title
field.title:(die OR hard)
documents containing die or hard in the
title
field.
Boolean operators and grouping
You can combine clauses using Boolean AND, OR and NOT operators to form more complex expressions, for example:
test AND results
selects documents containing both the word test and the word results.
test OR suite OR results
selects documents with at least one of test, suite or results.
title:test AND NOT title:complete
selects documents containing test and not containing complete in the
title
field.title:test AND (pass* OR fail*)
grouping; use parentheses to specify the precedence of terms in a Boolean clause. Query will match documents containing test in the
title
field and a word starting with pass or fail in the default search fields.title:(pass fail skip)
shorthand notation; documents containing at least one of pass, fail or skip in the
title
field.title:(+test +"result unknown")
shorthand notation; documents containing both pass and result unknown in the
title
field.
Note the Boolean operators must be written in all caps, otherwise they are parsed as regular terms.
Range operators
To search for ranges of textual or numeric values, use square or curly brackets, for example:
name:[Jones TO Smith]
inclusive range; selects documents whose
name
field has any value between Jones and Smith, including boundaries.score:{2.5 TO 7.3}
exclusive range; selects documents whose
score
field is between 2.5 and 7.3, excluding boundaries.score:{2.5 TO *]
one-sided range; selects documents whose
score
field is larger than 2.5.
Term boosting
Terms, quoted terms, term range expressions and grouped clauses can have a floating-point weight boost applied to them to increase their score relative to other clauses. For example:
jones^2 OR smith^0.5
prioritize documents with
jones
term over matches on thesmith
term.field:(a OR b NOT c)^2.5 OR field:d
apply the boost to a sub-query.
Special character escaping
Most search terms can be put in double quotes making special-character escaping not necessary. If the search term contains the quote character (or cannot be quoted for some reason), any character can be quoted with a backslash. For example:
\:\(quoted\+term\)\:
a single search term
(quoted+term):
with escape sequences. An alternative quoted form would be simpler:":(quoted+term):"
.
Minimum-should-match constraint for Boolean disjunction groups
A minimum-should-match operator can be applied to a disjunction Boolean query (a query with only "OR"-subclauses) and forces the query to match documents with at least the provided number of these subclauses. For example:
(blue crab fish)@2
matches all documents with at least two terms from the set [blue, crab, fish] (in any order).
((yellow OR blue) crab fish)@2
sub-clauses of a Boolean query can themselves be complex queries; here the min-should-match selects documents that match at least two of the provided three sub-clauses.
Interval function clauses
Interval functions are a powerful tool to express search needs in terms of one or more * contiguous fragments of text and their relationship to one another. All interval clauses start with the
fn:
prefix (possibly prefixed by a field specification). For example:fn:ordered(quick brown fox)
matches all documents (in the default field or in multi-field expansion) with at least one ordered sequence of
quick
,brown
andfox
terms.title:fn:maxwidth(5 fn:atLeast(2 quick brown fox))
matches all documents in the
title
field where at least two of the three terms (quick
,brown
andfox
) occur within five positions of each other.
-
-
Constructor Summary
Constructors Constructor Description StandardQueryParser()
Constructs aStandardQueryParser
object.StandardQueryParser(Analyzer analyzer)
Constructs aStandardQueryParser
object and sets anAnalyzer
to it.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
getAllowLeadingWildcard()
Analyzer
getAnalyzer()
DateTools.Resolution
getDateResolution()
Returns the defaultDateTools.Resolution
used for certain field when noDateTools.Resolution
is defined for this field.Map<CharSequence,DateTools.Resolution>
getDateResolutionMap()
Returns the field toDateTools.Resolution
map used to normalize each date field.StandardQueryConfigHandler.Operator
getDefaultOperator()
Gets implicit operator setting, which will be eitherStandardQueryConfigHandler.Operator.AND
orStandardQueryConfigHandler.Operator.OR
.boolean
getEnablePositionIncrements()
Map<String,Float>
getFieldsBoost()
Returns the field to boost map used to set boost for each field.float
getFuzzyMinSim()
Get the minimal similarity for fuzzy queries.int
getFuzzyPrefixLength()
Get the prefix length for fuzzy queries.Locale
getLocale()
Returns current locale, allowing access by subclasses.CharSequence[]
getMultiFields()
Returns the fields used to expand the query when the field for a certain query isnull
MultiTermQuery.RewriteMethod
getMultiTermRewriteMethod()
int
getPhraseSlop()
Gets the default slop for phrases.Map<String,PointsConfig>
getPointsConfigMap()
TimeZone
getTimeZone()
Query
parse(String query, String defaultField)
OverridesQueryParserHelper.parse(String, String)
so it casts the return object toQuery
.void
setAllowLeadingWildcard(boolean allowLeadingWildcard)
Set totrue
to allow leading wildcard characters.void
setAnalyzer(Analyzer analyzer)
void
setDateResolution(DateTools.Resolution dateResolution)
Sets the defaultDateTools.Resolution
used for certain field when noDateTools.Resolution
is defined for this field.void
setDateResolutionMap(Map<CharSequence,DateTools.Resolution> dateRes)
Sets theDateTools.Resolution
used for each fieldvoid
setDefaultOperator(StandardQueryConfigHandler.Operator operator)
Sets the boolean operator of the QueryParser.void
setEnablePositionIncrements(boolean enabled)
Set totrue
to enable position increments in result query.void
setFieldsBoost(Map<String,Float> boosts)
Sets the boost used for each field.void
setFuzzyMinSim(float fuzzyMinSim)
Set the minimum similarity for fuzzy queries.void
setFuzzyPrefixLength(int fuzzyPrefixLength)
Set the prefix length for fuzzy queries.void
setLocale(Locale locale)
Set locale used by date range parsing.void
setMultiFields(CharSequence[] fields)
Set the fields a query should be expanded to when the field isnull
void
setMultiTermRewriteMethod(MultiTermQuery.RewriteMethod method)
By default QueryParser usesMultiTermQuery.CONSTANT_SCORE_BLENDED_REWRITE
when creating aPrefixQuery
,WildcardQuery
orTermRangeQuery
.void
setPhraseSlop(int defaultPhraseSlop)
Sets the default slop for phrases.void
setPointsConfigMap(Map<String,PointsConfig> pointsConfigMap)
void
setTimeZone(TimeZone timeZone)
String
toString()
-
Methods inherited from class org.apache.lucene.queryparser.flexible.core.QueryParserHelper
getQueryBuilder, getQueryConfigHandler, getQueryNodeProcessor, getSyntaxParser, setQueryBuilder, setQueryConfigHandler, setQueryNodeProcessor, setSyntaxParser
-
-
-
-
Constructor Detail
-
StandardQueryParser
public StandardQueryParser()
Constructs aStandardQueryParser
object.
-
StandardQueryParser
public StandardQueryParser(Analyzer analyzer)
Constructs aStandardQueryParser
object and sets anAnalyzer
to it. The same as:StandardQueryParser qp = new StandardQueryParser(); qp.getQueryConfigHandler().setAnalyzer(analyzer);
- Parameters:
analyzer
- the analyzer to be used by this query parser helper
-
-
Method Detail
-
parse
public Query parse(String query, String defaultField) throws QueryNodeException
OverridesQueryParserHelper.parse(String, String)
so it casts the return object toQuery
. For more reference about this method, checkQueryParserHelper.parse(String, String)
.- Overrides:
parse
in classQueryParserHelper
- Parameters:
query
- the query stringdefaultField
- the default field used by the text parser- Returns:
- the object built from the query
- Throws:
QueryNodeException
- if something wrong happens along the three phases
-
getDefaultOperator
public StandardQueryConfigHandler.Operator getDefaultOperator()
Gets implicit operator setting, which will be eitherStandardQueryConfigHandler.Operator.AND
orStandardQueryConfigHandler.Operator.OR
.
-
setDefaultOperator
public void setDefaultOperator(StandardQueryConfigHandler.Operator operator)
Sets the boolean operator of the QueryParser. In default mode (StandardQueryConfigHandler.Operator.OR
) terms without any modifiers are considered optional: for examplecapital of Hungary
is equal tocapital OR of OR Hungary
.
InStandardQueryConfigHandler.Operator.AND
mode terms are considered to be in conjunction: the above mentioned query is parsed ascapital AND of AND Hungary
-
setAllowLeadingWildcard
public void setAllowLeadingWildcard(boolean allowLeadingWildcard)
Set totrue
to allow leading wildcard characters.When set,
*
or?
are allowed as the first character of a PrefixQuery and WildcardQuery. Note that this can produce very slow queries on big indexes.Default: false.
- Specified by:
setAllowLeadingWildcard
in interfaceCommonQueryParserConfiguration
-
setEnablePositionIncrements
public void setEnablePositionIncrements(boolean enabled)
Set totrue
to enable position increments in result query.When set, result phrase and multi-phrase queries will be aware of position increments. Useful when e.g. a StopFilter increases the position increment of the token that follows an omitted token.
Default: false.
- Specified by:
setEnablePositionIncrements
in interfaceCommonQueryParserConfiguration
-
getEnablePositionIncrements
public boolean getEnablePositionIncrements()
- Specified by:
getEnablePositionIncrements
in interfaceCommonQueryParserConfiguration
- See Also:
setEnablePositionIncrements(boolean)
-
setMultiTermRewriteMethod
public void setMultiTermRewriteMethod(MultiTermQuery.RewriteMethod method)
Description copied from interface:CommonQueryParserConfiguration
By default QueryParser usesMultiTermQuery.CONSTANT_SCORE_BLENDED_REWRITE
when creating aPrefixQuery
,WildcardQuery
orTermRangeQuery
. This implementation is generally preferable because it a) Runs faster b) Does not have the scarcity of terms unduly influence score c) avoids anyIndexSearcher.TooManyClauses
exception. However, if your application really needs to use the old-fashionedBooleanQuery
expansion rewriting and the above points are not relevant then use this to change the rewrite method. As another alternative, if you prefer all terms to be rewritten as a filter up-front, you can useMultiTermQuery.CONSTANT_SCORE_REWRITE
. For more information on the different rewrite methods available, seeMultiTermQuery
documentation.- Specified by:
setMultiTermRewriteMethod
in interfaceCommonQueryParserConfiguration
-
getMultiTermRewriteMethod
public MultiTermQuery.RewriteMethod getMultiTermRewriteMethod()
- Specified by:
getMultiTermRewriteMethod
in interfaceCommonQueryParserConfiguration
- See Also:
setMultiTermRewriteMethod(org.apache.lucene.search.MultiTermQuery.RewriteMethod)
-
setMultiFields
public void setMultiFields(CharSequence[] fields)
Set the fields a query should be expanded to when the field isnull
- Parameters:
fields
- the fields used to expand the query
-
getMultiFields
public CharSequence[] getMultiFields()
Returns the fields used to expand the query when the field for a certain query isnull
- Returns:
- the fields used to expand the query
-
setFuzzyPrefixLength
public void setFuzzyPrefixLength(int fuzzyPrefixLength)
Set the prefix length for fuzzy queries. Default is 0.- Specified by:
setFuzzyPrefixLength
in interfaceCommonQueryParserConfiguration
- Parameters:
fuzzyPrefixLength
- The fuzzyPrefixLength to set.
-
setPointsConfigMap
public void setPointsConfigMap(Map<String,PointsConfig> pointsConfigMap)
-
getPointsConfigMap
public Map<String,PointsConfig> getPointsConfigMap()
-
setLocale
public void setLocale(Locale locale)
Set locale used by date range parsing.- Specified by:
setLocale
in interfaceCommonQueryParserConfiguration
-
getLocale
public Locale getLocale()
Returns current locale, allowing access by subclasses.- Specified by:
getLocale
in interfaceCommonQueryParserConfiguration
-
setTimeZone
public void setTimeZone(TimeZone timeZone)
- Specified by:
setTimeZone
in interfaceCommonQueryParserConfiguration
-
getTimeZone
public TimeZone getTimeZone()
- Specified by:
getTimeZone
in interfaceCommonQueryParserConfiguration
-
setPhraseSlop
public void setPhraseSlop(int defaultPhraseSlop)
Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is zero.- Specified by:
setPhraseSlop
in interfaceCommonQueryParserConfiguration
-
setAnalyzer
public void setAnalyzer(Analyzer analyzer)
-
getAnalyzer
public Analyzer getAnalyzer()
- Specified by:
getAnalyzer
in interfaceCommonQueryParserConfiguration
-
getAllowLeadingWildcard
public boolean getAllowLeadingWildcard()
- Specified by:
getAllowLeadingWildcard
in interfaceCommonQueryParserConfiguration
- See Also:
setAllowLeadingWildcard(boolean)
-
getFuzzyMinSim
public float getFuzzyMinSim()
Get the minimal similarity for fuzzy queries.- Specified by:
getFuzzyMinSim
in interfaceCommonQueryParserConfiguration
-
getFuzzyPrefixLength
public int getFuzzyPrefixLength()
Get the prefix length for fuzzy queries.- Specified by:
getFuzzyPrefixLength
in interfaceCommonQueryParserConfiguration
- Returns:
- Returns the fuzzyPrefixLength.
-
getPhraseSlop
public int getPhraseSlop()
Gets the default slop for phrases.- Specified by:
getPhraseSlop
in interfaceCommonQueryParserConfiguration
-
setFuzzyMinSim
public void setFuzzyMinSim(float fuzzyMinSim)
Set the minimum similarity for fuzzy queries. Default is defined onFuzzyQuery.defaultMaxEdits
.- Specified by:
setFuzzyMinSim
in interfaceCommonQueryParserConfiguration
-
setFieldsBoost
public void setFieldsBoost(Map<String,Float> boosts)
Sets the boost used for each field.- Parameters:
boosts
- a collection that maps a field to its boost
-
getFieldsBoost
public Map<String,Float> getFieldsBoost()
Returns the field to boost map used to set boost for each field.- Returns:
- the field to boost map
-
setDateResolution
public void setDateResolution(DateTools.Resolution dateResolution)
Sets the defaultDateTools.Resolution
used for certain field when noDateTools.Resolution
is defined for this field.- Specified by:
setDateResolution
in interfaceCommonQueryParserConfiguration
- Parameters:
dateResolution
- the defaultDateTools.Resolution
-
getDateResolution
public DateTools.Resolution getDateResolution()
Returns the defaultDateTools.Resolution
used for certain field when noDateTools.Resolution
is defined for this field.- Returns:
- the default
DateTools.Resolution
-
getDateResolutionMap
public Map<CharSequence,DateTools.Resolution> getDateResolutionMap()
Returns the field toDateTools.Resolution
map used to normalize each date field.- Returns:
- the field to
DateTools.Resolution
map
-
setDateResolutionMap
public void setDateResolutionMap(Map<CharSequence,DateTools.Resolution> dateRes)
Sets theDateTools.Resolution
used for each field- Parameters:
dateRes
- a collection that maps a field to itsDateTools.Resolution
-
-