Package org.apache.lucene.search.join
Index-time joins
The index-time joining support joins while searching, where joined documents are indexed as a
single document block using IndexWriter.addDocuments()
. This is useful for any normalized content (XML documents or database
tables). In database terms, all rows for all joined tables matching a single row of the primary
table must be indexed as a single document block, with the parent document being last in the
group.
When you index in this way, the documents in your index are divided into parent documents (the
last document of each block) and child documents (all others). You provide a BitSetProducer
that identifies the parent documents, as Lucene
does not currently record any information about doc blocks.
At search time, use ToParentBlockJoinQuery
to remap/join
matches from any child Query
(ie, a query that matches only
child documents) up to the parent document space. The resulting query can then be used as a
clause in any query that matches parent.
If you care about what child documents matched for each parent document, then use the ParentChildrenBlockJoinQuery
query to per matched parent document
retrieve the child documents that caused to match the parent document in first place. This query
should be used after your main query has been executed. For each hit execute the the ParentChildrenBlockJoinQuery
query
TopDocs results = searcher.search(mainQuery, 10); for (int i = 0; i < results.scoreDocs.length; i++) { ScoreDoc scoreDoc = results.scoreDocs[i]; // Run ParentChildrenBlockJoinQuery to figure out the top matching child docs: ParentChildrenBlockJoinQuery parentChildrenBlockJoinQuery = new ParentChildrenBlockJoinQuery(parentFilter, childQuery, scoreDoc.doc); TopDocs topChildResults = searcher.search(parentChildrenBlockJoinQuery, 3); // Process top child hits... }
To map/join in the opposite direction, use ToChildBlockJoinQuery
. This wraps any query matching parent
documents, creating the joined query matching only child documents.
Query-time joins
The query time joining is index term based and implemented as two pass search. The first pass collects all the terms from a fromField that match the fromQuery. The second pass returns all documents that have matching terms in a toField to the terms collected in the first pass.
Query time joining has the following input:
fromField
: The from field to join from.fromQuery
: The query executed to collect the from terms. This is usually the user specified query.multipleValuesPerDocument
: Whether the fromField contains more than one value per documentscoreMode
: Defines how scores are translated to the other join side. If you don't care about scoring useScoreMode.None
mode. This will disable scoring and is therefore more efficient (requires less memory and is faster).toField
: The to field to join to
Basically the query-time joining is accessible from one static method. The user of this method
supplies the method with the described input and a IndexSearcher
where the from
terms need to be collected from. The returned query can be executed with the same
IndexSearcher
, but also with another IndexSearcher
. Example usage of the
JoinUtil.createJoinQuery()
:
String fromField = "from"; // Name of the from field boolean multipleValuesPerDocument = false; // Set only to true in the case when your fromField has multiple values per document in your index String toField = "to"; // Name of the to field ScoreMode scoreMode = ScoreMode.Max; // Defines how the scores are translated into the other side of the join. Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode); TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher // Render topDocs...
-
Interface Summary Interface Description BitSetProducer A producer ofBitSet
s per segment. -
Class Summary Class Description BlockJoinSelector Select a value from a block of documents.CheckJoinIndex Utility class to check a block join index.DiversifyingChildrenByteKnnVectorQuery kNN byte vector query that joins matching children vector documents with their parent doc id.DiversifyingChildrenFloatKnnVectorQuery kNN float vector query that joins matching children vector documents with their parent doc id.DiversifyingNearestChildrenKnnCollectorManager DiversifyingNearestChildrenKnnCollectorManager responsible for creatingDiversifyingNearestChildrenKnnCollector
instances.JoinUtil Utility for query time joining.ParentChildrenBlockJoinQuery A query that returns all the matching child documents for a specific parent document indexed together in the same block.QueryBitSetProducer ABitSetProducer
that wraps a query and caches matchingBitSet
s per segment.SeekingTermSetTermsEnum A filtered TermsEnum that uses a BytesRefHash as a filterToChildBlockJoinQuery Just likeToParentBlockJoinQuery
, except this query joins in reverse: you provide a Query matching parent documents and it joins down to child documents.ToParentBlockJoinQuery This query requires that you index children and parent docs as a single block, using theIndexWriter.addDocuments()
orIndexWriter.updateDocuments()
API.ToParentBlockJoinSortField A special sort field that allows sorting parent docs based on nested / child level fields. -
Enum Summary Enum Description BlockJoinSelector.Type Type of selection to perform.ScoreMode How to aggregate multiple child hit scores into a single parent score.