Skip navigation links

Package org.apache.lucene.search.grouping

Grouping.

See: Description

Package org.apache.lucene.search.grouping Description

Grouping.

This module enables search result grouping with Lucene, where hits with the same value in the specified single-valued group field are grouped together. For example, if you group by the author field, then all documents with the same value in the author field fall into a single group.

Grouping requires a number of inputs:

The implementation is two-pass: the first pass (FirstPassGroupingCollector) gathers the top groups, and the second pass (SecondPassGroupingCollector) gathers documents within those groups. If the search is costly to run you may want to use the CachingCollector class, which caches hits and can (quickly) replay them for the second pass. This way you only run the query once, but you pay a RAM cost to (briefly) hold all hits. Results are returned as a TopGroups instance.

Groups are defined by GroupSelector implementations:

Known limitations:

Typical usage for the generic two-pass grouping search looks like this using the grouping convenience utility (optionally using caching for the second pass search):

   GroupingSearch groupingSearch = new GroupingSearch("author");
   groupingSearch.setGroupSort(groupSort);
   groupingSearch.setFillSortFields(fillFields);
 
   if (useCache) {
     // Sets cache in MB
     groupingSearch.setCachingInMB(4.0, true);
   }
 
   if (requiredTotalGroupCount) {
     groupingSearch.setAllGroups(true);
   }
 
   TermQuery query = new TermQuery(new Term("content", searchTerm));
   TopGroups<BytesRef> result = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);
 
   // Render groupsResult...
   if (requiredTotalGroupCount) {
     int totalGroupCount = result.totalGroupCount;
   }
 

To use the single-pass BlockGroupingCollector, first, at indexing time, you must ensure all docs in each group are added as a block, and you have some way to find the last document of each group. One simple way to do this is to add a marker binary field:

   // Create Documents from your source:
   List<Document> oneGroup = ...;
   
   Field groupEndField = new Field("groupEnd", "x", Field.Store.NO, Field.Index.NOT_ANALYZED);
   groupEndField.setIndexOptions(IndexOptions.DOCS_ONLY);
   groupEndField.setOmitNorms(true);
   oneGroup.get(oneGroup.size()-1).add(groupEndField);
 
   // You can also use writer.updateDocuments(); just be sure you
   // replace an entire previous doc block with this new one.  For
   // example, each group could have a "groupID" field, with the same
   // value for all docs in this group:
   writer.addDocuments(oneGroup);
 
Then, at search time, do this up front:
   // Set this once in your app & save away for reusing across all queries:
   Filter groupEndDocs = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term("groupEnd", "x"))));
 
Finally, do this per search:
   // Per search:
   BlockGroupingCollector c = new BlockGroupingCollector(groupSort, groupOffset+topNGroups, needsScores, groupEndDocs);
   s.search(new TermQuery(new Term("content", searchTerm)), c);
   TopGroups groupsResult = c.getTopGroups(withinGroupSort, groupOffset, docOffset, docOffset+docsPerGroup, fillFields);
 
   // Render groupsResult...
 
Or alternatively use the GroupingSearch convenience utility:
   // Per search:
   GroupingSearch groupingSearch = new GroupingSearch(groupEndDocs);
   groupingSearch.setGroupSort(groupSort);
   groupingSearch.setIncludeScores(needsScores);
   TermQuery query = new TermQuery(new Term("content", searchTerm));
   TopGroups groupsResult = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);

   // Render groupsResult...
 
Note that the groupValue of each GroupDocs will be null, so if you need to present this value you'll have to separately retrieve it (for example using stored fields, FieldCache, etc.).

Another collector is the AllGroupHeadsCollector that can be used to retrieve all most relevant documents per group. Also known as group heads. This can be useful in situations when one wants to compute group based facets / statistics on the complete query result. The collector can be executed during the first or second phase. This collector can also be used with the GroupingSearch convenience utility, but when if one only wants to compute the most relevant documents per group it is better to just use the collector as done here below.

   TermGroupSelector grouper = new TermGroupSelector(groupField);
   AllGroupHeadsCollector c = AllGroupHeadsCollector.newCollector(grouper, sortWithinGroup);
   s.search(new TermQuery(new Term("content", searchTerm)), c);
   // Return all group heads as int array
   int[] groupHeadsArray = c.retrieveGroupHeads()
   // Return all group heads as FixedBitSet.
   int maxDoc = s.maxDoc();
   FixedBitSet groupHeadsBitSet = c.retrieveGroupHeads(maxDoc)
 
Skip navigation links

Copyright © 2000-2020 Apache Software Foundation. All Rights Reserved.