Class | Description |
---|---|
AllGroupHeadsCollector<T> |
This collector specializes in collecting the most relevant document (group head) for each
group that matches the query.
|
AllGroupHeadsCollector.GroupHead<T> |
Represents a group head.
|
AllGroupsCollector<T> |
A collector that collects all groups that match the
query.
|
BlockGroupingCollector |
BlockGroupingCollector performs grouping with a
single pass collector, as long as you are grouping by a
doc block field, ie all documents sharing a given group
value were indexed as a doc block using the atomic
IndexWriter.addDocuments()
or IndexWriter.updateDocuments()
API. |
CollectedSearchGroup<T> |
Expert: representation of a group in
FirstPassGroupingCollector ,
tracking the top doc and FieldComparator slot. |
DistinctValuesCollector<T,R> |
A second pass grouping collector that keeps track of distinct values for a specified field for the top N group.
|
DistinctValuesCollector.GroupCount<T,R> |
Returned by
DistinctValuesCollector.getGroups() ,
representing the value and set of distinct values for the group. |
FirstPassGroupingCollector<T> |
FirstPassGroupingCollector is the first of two passes necessary
to collect grouped hits.
|
GroupDocs<T> |
Represents one group in the results.
|
GroupFacetCollector |
Base class for computing grouped facets.
|
GroupFacetCollector.FacetEntry |
Represents a facet entry with a value and a count.
|
GroupFacetCollector.GroupedFacetResult |
The grouped facet result.
|
GroupFacetCollector.SegmentResult |
Contains the local grouped segment counts for a particular segment.
|
GroupingSearch |
Convenience class to perform grouping in a non distributed environment.
|
GroupReducer<T,C extends Collector> |
Concrete implementations of this class define what to collect for individual
groups during the second-pass of a grouping search.
|
GroupSelector<T> |
Defines a group, for use by grouping collectors
A GroupSelector acts as an iterator over documents.
|
SearchGroup<T> |
Represents a group that is found during the first pass search.
|
SecondPassGroupingCollector<T> |
SecondPassGroupingCollector runs over an already collected set of
groups, further applying a
GroupReducer to each group |
TermGroupFacetCollector |
An implementation of
GroupFacetCollector that computes grouped facets based on the indexed terms
from DocValues. |
TermGroupSelector |
A GroupSelector implementation that groups via SortedDocValues
|
TopGroups<T> |
Represents result returned by a grouping search.
|
TopGroupsCollector<T> |
A second-pass collector that collects the TopDocs for each group, and
returns them as a
TopGroups object |
ValueSourceGroupSelector |
A GroupSelector that groups via a ValueSource
|
Enum | Description |
---|---|
GroupSelector.State |
What to do with the current value
|
TopGroups.ScoreMergeMode |
How the GroupDocs score (if any) should be merged.
|
This module enables search result grouping with Lucene, where hits
with the same value in the specified single-valued group field are
grouped together. For example, if you group by the author
field, then all documents with the same value in the author
field fall into a single group.
Grouping requires a number of inputs:
groupField
: this is the field used for grouping.
For example, if you use the author
field then each
group has all books by the same author. Documents that don't
have this field are grouped under a single group with
a null
group value.
groupSort
: how the groups are sorted. For sorting
purposes, each group is "represented" by the highest-sorted
document according to the groupSort
within it. For
example, if you specify "price" (ascending) then the first group
is the one with the lowest price book within it. Or if you
specify relevance group sort, then the first group is the one
containing the highest scoring book.
topNGroups
: how many top groups to keep. For
example, 10 means the top 10 groups are computed.
groupOffset
: which "slice" of top groups you want to
retrieve. For example, 3 means you'll get 7 groups back
(assuming topNGroups
is 10). This is useful for
paging, where you might show 5 groups per page.
withinGroupSort
: how the documents within each group
are sorted. This can be different from the group sort.
maxDocsPerGroup
: how many top documents within each
group to keep.
withinGroupOffset
: which "slice" of top
documents you want to retrieve from each group.
The implementation is two-pass: the first pass (FirstPassGroupingCollector
)
gathers the top groups, and the second pass (SecondPassGroupingCollector
)
gathers documents within those groups. If the search is costly to
run you may want to use the CachingCollector
class, which
caches hits and can (quickly) replay them for the second pass. This
way you only run the query once, but you pay a RAM cost to (briefly)
hold all hits. Results are returned as a TopGroups
instance.
Groups are defined by GroupSelector
implementations:
TermGroupSelector
groups based on
the value of a SortedDocValues
fieldValueSourceGroupSelector
groups based on
the value of a ValueSource
Known limitations:
Typical usage for the generic two-pass grouping search looks like this using the grouping convenience utility (optionally using caching for the second pass search):
GroupingSearch groupingSearch = new GroupingSearch("author"); groupingSearch.setGroupSort(groupSort); groupingSearch.setFillSortFields(fillFields); if (useCache) { // Sets cache in MB groupingSearch.setCachingInMB(4.0, true); } if (requiredTotalGroupCount) { groupingSearch.setAllGroups(true); } TermQuery query = new TermQuery(new Term("content", searchTerm)); TopGroups<BytesRef> result = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit); // Render groupsResult... if (requiredTotalGroupCount) { int totalGroupCount = result.totalGroupCount; }
To use the single-pass BlockGroupingCollector
,
first, at indexing time, you must ensure all docs in each group
are added as a block, and you have some way to find the last
document of each group. One simple way to do this is to add a
marker binary field:
// Create Documents from your source: List<Document> oneGroup = ...; Field groupEndField = new Field("groupEnd", "x", Field.Store.NO, Field.Index.NOT_ANALYZED); groupEndField.setIndexOptions(IndexOptions.DOCS_ONLY); groupEndField.setOmitNorms(true); oneGroup.get(oneGroup.size()-1).add(groupEndField); // You can also use writer.updateDocuments(); just be sure you // replace an entire previous doc block with this new one. For // example, each group could have a "groupID" field, with the same // value for all docs in this group: writer.addDocuments(oneGroup);Then, at search time, do this up front:
// Set this once in your app & save away for reusing across all queries: Filter groupEndDocs = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term("groupEnd", "x"))));Finally, do this per search:
// Per search: BlockGroupingCollector c = new BlockGroupingCollector(groupSort, groupOffset+topNGroups, needsScores, groupEndDocs); s.search(new TermQuery(new Term("content", searchTerm)), c); TopGroups groupsResult = c.getTopGroups(withinGroupSort, groupOffset, docOffset, docOffset+docsPerGroup, fillFields); // Render groupsResult...Or alternatively use the
GroupingSearch
convenience utility:
// Per search: GroupingSearch groupingSearch = new GroupingSearch(groupEndDocs); groupingSearch.setGroupSort(groupSort); groupingSearch.setIncludeScores(needsScores); TermQuery query = new TermQuery(new Term("content", searchTerm)); TopGroups groupsResult = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit); // Render groupsResult...Note that the
groupValue
of each GroupDocs
will be null
, so if you need to present this value you'll
have to separately retrieve it (for example using stored
fields, FieldCache
, etc.).
Another collector is the AllGroupHeadsCollector
that can be used to retrieve all most relevant
documents per group. Also known as group heads. This can be useful in situations when one wants to compute group
based facets / statistics on the complete query result. The collector can be executed during the first or second
phase. This collector can also be used with the GroupingSearch
convenience utility, but when if one only
wants to compute the most relevant documents per group it is better to just use the collector as done here below.
TermGroupSelector grouper = new TermGroupSelector(groupField); AllGroupHeadsCollector c = AllGroupHeadsCollector.newCollector(grouper, sortWithinGroup); s.search(new TermQuery(new Term("content", searchTerm)), c); // Return all group heads as int array int[] groupHeadsArray = c.retrieveGroupHeads() // Return all group heads as FixedBitSet. int maxDoc = s.maxDoc(); FixedBitSet groupHeadsBitSet = c.retrieveGroupHeads(maxDoc)
Copyright © 2000-2017 Apache Software Foundation. All Rights Reserved.