org.apache.lucene.search.grouping
Class GroupingSearch

java.lang.Object
  extended by org.apache.lucene.search.grouping.GroupingSearch

public class GroupingSearch
extends Object

Convenience class to perform grouping in a non distributed environment.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Constructor Summary
GroupingSearch(Filter groupEndDocs)
          Constructor for grouping documents by doc block.
GroupingSearch(String groupField)
          Constructs a GroupingSearch instance that groups documents by index terms using the FieldCache.
GroupingSearch(ValueSource groupFunction, Map<?,?> valueSourceContext)
          Constructs a GroupingSearch instance that groups documents by function using a ValueSource instance.
 
Method Summary
 GroupingSearch disableCaching()
          Disables any enabled cache.
 Bits getAllGroupHeads()
          Returns the matching group heads if setAllGroupHeads(boolean) was set to true or an empty bit set.
<T> Collection<T>
getAllMatchingGroups()
          If setAllGroups(boolean) was set to true then all matching groups are returned, otherwise an empty collection is returned.
protected  TopGroups<?> groupByDocBlock(IndexSearcher searcher, Filter filter, Query query, int groupOffset, int groupLimit)
           
protected  TopGroups groupByFieldOrFunction(IndexSearcher searcher, Filter filter, Query query, int groupOffset, int groupLimit)
           
<T> TopGroups<T>
search(IndexSearcher searcher, Filter filter, Query query, int groupOffset, int groupLimit)
          Executes a grouped search.
<T> TopGroups<T>
search(IndexSearcher searcher, Query query, int groupOffset, int groupLimit)
          Executes a grouped search.
 GroupingSearch setAllGroupHeads(boolean allGroupHeads)
          Whether to compute all group heads (most relevant document per group) matching the query.
 GroupingSearch setAllGroups(boolean allGroups)
          Whether to also compute all groups matching the query.
 GroupingSearch setCaching(int maxDocsToCache, boolean cacheScores)
          Enables caching for the second pass search.
 GroupingSearch setCachingInMB(double maxCacheRAMMB, boolean cacheScores)
          Enables caching for the second pass search.
 GroupingSearch setFillSortFields(boolean fillSortFields)
          Whether to also fill the sort fields per returned group and groups docs.
 GroupingSearch setGroupDocsLimit(int groupDocsLimit)
          Specifies the number of documents to return inside a group from the specified groupDocsOffset.
 GroupingSearch setGroupDocsOffset(int groupDocsOffset)
          Specifies the offset for documents inside a group.
 GroupingSearch setGroupSort(Sort groupSort)
          Specifies how groups are sorted.
 GroupingSearch setIncludeMaxScore(boolean includeMaxScore)
          Whether to include the score of the most relevant document per group.
 GroupingSearch setIncludeScores(boolean includeScores)
          Whether to include the scores per doc inside a group.
 GroupingSearch setInitialSize(int initialSize)
          Sets the initial size of some internal used data structures.
 GroupingSearch setSortWithinGroup(Sort sortWithinGroup)
          Specified how documents inside a group are sorted.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GroupingSearch

public GroupingSearch(String groupField)
Constructs a GroupingSearch instance that groups documents by index terms using the FieldCache. The group field can only have one token per document. This means that the field must not be analysed.

Parameters:
groupField - The name of the field to group by.

GroupingSearch

public GroupingSearch(ValueSource groupFunction,
                      Map<?,?> valueSourceContext)
Constructs a GroupingSearch instance that groups documents by function using a ValueSource instance.

Parameters:
groupFunction - The function to group by specified as ValueSource
valueSourceContext - The context of the specified groupFunction

GroupingSearch

public GroupingSearch(Filter groupEndDocs)
Constructor for grouping documents by doc block. This constructor can only be used when documents belonging in a group are indexed in one block.

Parameters:
groupEndDocs - The filter that marks the last document in all doc blocks
Method Detail

search

public <T> TopGroups<T> search(IndexSearcher searcher,
                               Query query,
                               int groupOffset,
                               int groupLimit)
                    throws IOException
Executes a grouped search. Both the first pass and second pass are executed on the specified searcher.

Parameters:
searcher - The IndexSearcher instance to execute the grouped search on.
query - The query to execute with the grouping
groupOffset - The group offset
groupLimit - The number of groups to return from the specified group offset
Returns:
the grouped result as a TopGroups instance
Throws:
IOException - If any I/O related errors occur

search

public <T> TopGroups<T> search(IndexSearcher searcher,
                               Filter filter,
                               Query query,
                               int groupOffset,
                               int groupLimit)
                    throws IOException
Executes a grouped search. Both the first pass and second pass are executed on the specified searcher.

Parameters:
searcher - The IndexSearcher instance to execute the grouped search on.
filter - The filter to execute with the grouping
query - The query to execute with the grouping
groupOffset - The group offset
groupLimit - The number of groups to return from the specified group offset
Returns:
the grouped result as a TopGroups instance
Throws:
IOException - If any I/O related errors occur

groupByFieldOrFunction

protected TopGroups groupByFieldOrFunction(IndexSearcher searcher,
                                           Filter filter,
                                           Query query,
                                           int groupOffset,
                                           int groupLimit)
                                    throws IOException
Throws:
IOException

groupByDocBlock

protected TopGroups<?> groupByDocBlock(IndexSearcher searcher,
                                       Filter filter,
                                       Query query,
                                       int groupOffset,
                                       int groupLimit)
                                throws IOException
Throws:
IOException

setCachingInMB

public GroupingSearch setCachingInMB(double maxCacheRAMMB,
                                     boolean cacheScores)
Enables caching for the second pass search. The cache will not grow over a specified limit in MB. The cache is filled during the first pass searched and then replayed during the second pass searched. If the cache grows beyond the specified limit, then the cache is purged and not used in the second pass search.

Parameters:
maxCacheRAMMB - The maximum amount in MB the cache is allowed to hold
cacheScores - Whether to cache the scores
Returns:
this

setCaching

public GroupingSearch setCaching(int maxDocsToCache,
                                 boolean cacheScores)
Enables caching for the second pass search. The cache will not contain more than the maximum specified documents. The cache is filled during the first pass searched and then replayed during the second pass searched. If the cache grows beyond the specified limit, then the cache is purged and not used in the second pass search.

Parameters:
maxDocsToCache - The maximum number of documents the cache is allowed to hold
cacheScores - Whether to cache the scores
Returns:
this

disableCaching

public GroupingSearch disableCaching()
Disables any enabled cache.

Returns:
this

setGroupSort

public GroupingSearch setGroupSort(Sort groupSort)
Specifies how groups are sorted. Defaults to Sort.RELEVANCE.

Parameters:
groupSort - The sort for the groups.
Returns:
this

setSortWithinGroup

public GroupingSearch setSortWithinGroup(Sort sortWithinGroup)
Specified how documents inside a group are sorted. Defaults to Sort.RELEVANCE.

Parameters:
sortWithinGroup - The sort for documents inside a group
Returns:
this

setGroupDocsOffset

public GroupingSearch setGroupDocsOffset(int groupDocsOffset)
Specifies the offset for documents inside a group.

Parameters:
groupDocsOffset - The offset for documents inside a
Returns:
this

setGroupDocsLimit

public GroupingSearch setGroupDocsLimit(int groupDocsLimit)
Specifies the number of documents to return inside a group from the specified groupDocsOffset.

Parameters:
groupDocsLimit - The number of documents to return inside a group
Returns:
this

setFillSortFields

public GroupingSearch setFillSortFields(boolean fillSortFields)
Whether to also fill the sort fields per returned group and groups docs.

Parameters:
fillSortFields - Whether to also fill the sort fields per returned group and groups docs
Returns:
this

setIncludeScores

public GroupingSearch setIncludeScores(boolean includeScores)
Whether to include the scores per doc inside a group.

Parameters:
includeScores - Whether to include the scores per doc inside a group
Returns:
this

setIncludeMaxScore

public GroupingSearch setIncludeMaxScore(boolean includeMaxScore)
Whether to include the score of the most relevant document per group.

Parameters:
includeMaxScore - Whether to include the score of the most relevant document per group
Returns:
this

setAllGroups

public GroupingSearch setAllGroups(boolean allGroups)
Whether to also compute all groups matching the query. This can be used to determine the number of groups, which can be used for accurate pagination.

When grouping by doc block the number of groups are automatically included in the TopGroups and this option doesn't have any influence.

Parameters:
allGroups - to also compute all groups matching the query
Returns:
this

getAllMatchingGroups

public <T> Collection<T> getAllMatchingGroups()
If setAllGroups(boolean) was set to true then all matching groups are returned, otherwise an empty collection is returned.

Type Parameters:
T - The group value type. This can be a BytesRef or a MutableValue instance. If grouping by doc block this the group value is always null.
Returns:
all matching groups are returned, or an empty collection

setAllGroupHeads

public GroupingSearch setAllGroupHeads(boolean allGroupHeads)
Whether to compute all group heads (most relevant document per group) matching the query.

This feature isn't enabled when grouping by doc block.

Parameters:
allGroupHeads - Whether to compute all group heads (most relevant document per group) matching the query
Returns:
this

getAllGroupHeads

public Bits getAllGroupHeads()
Returns the matching group heads if setAllGroupHeads(boolean) was set to true or an empty bit set.

Returns:
The matching group heads if setAllGroupHeads(boolean) was set to true or an empty bit set

setInitialSize

public GroupingSearch setInitialSize(int initialSize)
Sets the initial size of some internal used data structures. This prevents growing data structures many times. This can improve the performance of the grouping at the cost of more initial RAM.

The setAllGroups(boolean) and setAllGroupHeads(boolean) features use this option. Defaults to 128.

Parameters:
initialSize - The initial size of some internal used data structures
Returns:
this


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.