org.apache.lucene.search.join (Lucene 5.3.0 API)

Interface Summary
Interface Description

BitSetProducer
A producer of BitSets per segment.

Interface Summary
Interface	Description
BitSetProducer	A producer of `BitSet`s per segment.

Class Summary
Class	Description
BitDocIdSetCachingWrapperFilter	Deprecated Use `QueryBitSetProducer` instead
BitDocIdSetFilter	Deprecated Use `BitSetProducer` instead
BlockJoinSelector	Select a value from a block of documents.
CheckJoinIndex	Utility class to check a block join index.
JoinUtil	Utility for query time joining.
QueryBitSetProducer	A `BitSetProducer` that wraps a query and caches matching `BitSet`s per segment.
ToChildBlockJoinQuery	Just like `ToParentBlockJoinQuery`, except this query joins in reverse: you provide a Query matching parent documents and it joins down to child documents.
ToParentBlockJoinCollector	Collects parent document hits for a Query containing one more more BlockJoinQuery clauses, sorted by the specified parent Sort.
ToParentBlockJoinIndexSearcher	An `IndexSearcher` to use in conjunction with `ToParentBlockJoinCollector`.
ToParentBlockJoinQuery	This query requires that you index children and parent docs as a single block, using the `IndexWriter.addDocuments()` or `IndexWriter.updateDocuments()` API.
ToParentBlockJoinSortField	A special sort field that allows sorting parent docs based on nested / child level fields.

Enum Summary
Enum Description

BlockJoinSelector.Type
Type of selection to perform.

ScoreMode
How to aggregate multiple child hit scores into a single parent score.

Enum Summary
Enum	Description
BlockJoinSelector.Type	Type of selection to perform.
ScoreMode	How to aggregate multiple child hit scores into a single parent score.

Package org.apache.lucene.search.join Description

Support for index-time and query-time joins.

Index-time joins

The index-time joining support joins while searching, where joined documents are indexed as a single document block using IndexWriter.addDocuments(). This is useful for any normalized content (XML documents or database tables). In database terms, all rows for all joined tables matching a single row of the primary table must be indexed as a single document block, with the parent document being last in the group.

When you index in this way, the documents in your index are divided into parent documents (the last document of each block) and child documents (all others). You provide a Filter that identifies the parent documents, as Lucene does not currently record any information about doc blocks.

At search time, use ToParentBlockJoinQuery to remap/join matches from any child Query (ie, a query that matches only child documents) up to the parent document space. The resulting query can then be used as a clause in any query that matches parent.

If you only care about the parent documents matching the query, you can use any collector to collect the parent hits, but if you'd also like to see which child documents match for each parent document, use the ToParentBlockJoinCollector to collect the hits. Once the search is done, you retrieve a TopGroups instance from the ToParentBlockJoinCollector.getTopGroups() method.

To map/join in the opposite direction, use ToChildBlockJoinQuery. This wraps any query matching parent documents, creating the joined query matching only child documents.

Query-time joins

The query time joining is index term based and implemented as two pass search. The first pass collects all the terms from a fromField that match the fromQuery. The second pass returns all documents that have matching terms in a toField to the terms collected in the first pass.

Query time joining has the following input:

fromField: The from field to join from.
fromQuery: The query executed to collect the from terms. This is usually the user specified query.
multipleValuesPerDocument: Whether the fromField contains more than one value per document
scoreMode: Defines how scores are translated to the other join side. If you don't care about scoring use ScoreMode.None mode. This will disable scoring and is therefore more efficient (requires less memory and is faster).
toField: The to field to join to

Basically the query-time joining is accessible from one static method. The user of this method supplies the method with the described input and a IndexSearcher where the from terms need to be collected from. The returned query can be executed with the same IndexSearcher, but also with another IndexSearcher. Example usage of the JoinUtil.createJoinQuery() :

   String fromField = "from"; // Name of the from field
   boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index
   String toField = "to"; // Name of the to field
   ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join.
   Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values
 
   Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode);
   TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher
   // Render topDocs...