Package org.apache.lucene.monitor
Monitoring framework
This package contains classes to allow the monitoring of a stream of documents with a set of queries.To use, instantiate a Monitor
object, register queries with
it via Monitor.register(org.apache.lucene.monitor.MonitorQuery...)
, and then
match documents against it either individually via Monitor.match(org.apache.lucene.document.Document,
org.apache.lucene.monitor.MatcherFactory)
or in batches via Monitor.match(org.apache.lucene.document.Document[],
org.apache.lucene.monitor.MatcherFactory)
Matcher types
A number of matcher types are included:QueryMatch.SIMPLE_MATCHER
— just returns the set of query ids that a Document has matchedScoringMatch.matchWithSimilarity(org.apache.lucene.search.similarities.Similarity)
— returns the set of matching queries, with the score that each one records against a Document— similar to ScoringMatch, but include the full Explanation
— return the matching queries along with the matching terms for each query
PartitionMatcher
or ParallelMatcher
to increase performance in low-concurrency systems.
Pre-filtering of queries
Monitoring is done efficiently by extracting minimal sets of terms from queries, and using these to build a query index. When a document is passed toMonitor.match(org.apache.lucene.document.Document,
org.apache.lucene.monitor.MatcherFactory)
, it is converted into a small index, and the terms
dictionary from that index is then used to build a disjunction query to run against the query
index. Queries that match this disjunction are then run against the document. In this way, the
Monitor can avoid running queries that have no chance of matching. The process of extracting
terms and building document disjunctions is handled by a Presearcher
In addition, extra per-field filtering can be specified by passing a set of keyword fields to filter on. When queries are registered with the monitor, field-value pairs can be added as optional metadata for each query, and these can then be used to restrict which queries a document is checked against. For example, you can specify a language that each query should apply to, and documents containing a value in their language field would only be checked against queries that have that same value in their language metadata. Note that when matching documents in batches, all documents in the batch must have the same values in their filter fields.
Query analysis uses the QueryVisitor
API to extract terms,
which will work for all basic term-based queries shipped with Lucene. The analyzer builds a
representation of the query called a QueryTree
, and then
selects a minimal set of terms, one of which must be present in a document for that document to
match. Individual terms are weighted using a TermWeightor
,
which allows some selectivity when building the term set. For example, given a conjunction of
terms (a boolean query with several MUST clauses, or a phrase, span or interval query), we need
only extract one term. The TermWeightor can be configured in a number of ways; by default it will
weight longer terms more highly.
For query sets that contain many conjunctions, it can be useful to extract and index different
minimal term combinations. For example, a phrase query on 'the quick brown fox' could index both
'quick' and 'brown', and avoid being run against documents that contain only one of these terms.
The MultipassTermFilteredPresearcher
allows this sort of
indexing, taking a minimum term weight so that very common terms such as 'the' can be avoided.
Custom Query implementations that are based on term matching, and that implement Query.visit(org.apache.lucene.search.QueryVisitor)
will work with no
extra configuration; for more complicated custom queries, you can register a CustomQueryHandler
with the presearcher. Included in this package is a
RegexpQueryHandler
, which gives an example of a different
method of indexing automaton-based queries by extracting fixed substrings from a regular
expression, and then using ngram filtering to build the document disjunction.
Persistent query sets
By default,Monitor
instances are ephemeral, storing their
query indexes in memory. To make a persistent monitor, build a MonitorConfiguration
object and call MonitorConfiguration.setIndexPath(java.nio.file.Path,
org.apache.lucene.monitor.MonitorQuerySerializer)
to tell the Monitor to store its query index
on disk. All queries registered with this Monitor will need to have a string representation that
is also stored, and can be re-parsed by the associated MonitorQuerySerializer
when the index is loaded by a new Monitor
instance.-
Interface Summary Interface Description CustomQueryHandler Builds aQueryTree
for a query that needs custom treatmentMatcherFactory<T extends QueryMatch> Interface for the creation of new CandidateMatcher objectsMonitorQuerySerializer Serializes and deserializes MonitorQuery objects into byte streamsMonitorUpdateListener For reporting events on a Monitor's query indexQueryTimeListener Notified of the time it takes to run individual queries against a set of documentsTermFilteredPresearcher.DocumentQueryBuilder Constructs a document disjunction from a set of termsTermWeightor Calculates the weight of aTerm
-
Class Summary Class Description CandidateMatcher<T extends QueryMatch> Class used to match candidate queries selected by a Presearcher from a Monitor query index.ConcurrentQueryLoader Utility class for concurrently loading queries into a Monitor.ExplainingMatch A query match containing the score explanation of the matchHighlightsMatch QueryMatch object that contains the hit positions of a matching QueryHighlightsMatch.Hit Represents an individual hitMatchingQueries<T extends QueryMatch> Class to hold the results of matching a singleDocument
against queries held in the MonitorMonitor Monitor.QueryCacheStats Statistics for the query cache and query indexMonitorConfiguration Encapsulates various configuration settings for a Monitor's query indexMonitorQuery Defines a query to be stored in a MonitorMultiMatchingQueries<T extends QueryMatch> Class to hold the results of matching a batch ofDocument
s against queries held in the MonitorMultipassTermFilteredPresearcher A TermFilteredPresearcher that indexes queries multiple times, with terms collected from different routes through a querytree.ParallelMatcher<T extends QueryMatch> Matcher class that runs matching queries in parallel.PartitionMatcher<T extends QueryMatch> A multi-threaded matcher that collects all possible matches in one pass, and then partitions them amongst a number of worker threads to perform the actual matching.Presearcher A Presearcher is used by the Monitor to reduce the number of queries actually run against a Document.PresearcherMatch<T extends QueryMatch> Wraps aQueryMatch
with information about which queries were selected by the presearcherPresearcherMatches<T extends QueryMatch> Wraps aMultiMatchingQueries
with information on which presearcher queries were selectedQueryDecomposer Split a disjunction query into its consituent parts, so that they can be indexed and run separately in the Monitor.QueryMatch Represents a match for a specific query and documentQueryTree A representation of a node in a query treeRegexpQueryHandler A query handler implementation that matches Regexp queries by indexing regex terms by their longest static substring, and generates ngrams from Document tokens to match them.ScoringMatch A QueryMatch that reports scores for each matchSlowLog Reports on slow queries in a given match runSlowLog.Entry An individual entry in the slow logTermFilteredPresearcher Presearcher implementation that uses terms extracted from queries to index them in the Monitor, and builds a disjunction from terms in a document to match them.