org.apache.solr.request
Class UnInvertedField

java.lang.Object
  extended by org.apache.lucene.index.DocTermOrds
      extended by org.apache.solr.request.UnInvertedField

public class UnInvertedField
extends DocTermOrds

Final form of the un-inverted field: Each document points to a list of term numbers that are contained in that document. Term numbers are in sorted order, and are encoded as variable-length deltas from the previous term number. Real term numbers start at 2 since 0 and 1 are reserved. A term number of 0 signals the end of the termNumber list. There is a single int[maxDoc()] which either contains a pointer into a byte[] for the termNumber lists, or directly contains the termNumber list if it fits in the 4 bytes of an integer. If the first byte in the integer is 1, the next 3 bytes are a pointer into a byte[] where the termNumber list starts. There are actually 256 byte arrays, to compensate for the fact that the pointers into the byte arrays are only 3 bytes long. The correct byte array for a document is a function of it's id. To save space and speed up faceting, any term that matches enough documents will not be un-inverted... it will be skipped while building the un-inverted field structure, and will use a set intersection method during faceting. To further save memory, the terms (the actual string values) are not all stored in memory, but a TermIndex is used to convert term numbers to term values only for the terms needed after faceting has completed. Only every 128th term value is stored, along with it's corresponding term number, and this is used as an index to find the closest term and iterate until the desired number is hit (very much like Lucene's own internal term index).


Field Summary
 
Fields inherited from class org.apache.lucene.index.DocTermOrds
DEFAULT_INDEX_INTERVAL_BITS, docsEnum, field, index, indexedTermsArray, maxTermDocFreq, numTermsInField, ordBase, phase1_time, prefix, sizeOfIndexedStrings, termInstances, tnums, total_time
 
Constructor Summary
UnInvertedField(String field, SolrIndexSearcher searcher)
           
 
Method Summary
 NamedList<Integer> getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix)
           
 int getNumTerms()
           
 StatsValues getStats(SolrIndexSearcher searcher, DocSet baseDocs, String[] facet)
          Collect statistics about the UninvertedField.
static UnInvertedField getUnInvertedField(String field, SolrIndexSearcher searcher)
           
 long memSize()
           
protected  void setActualDocFreq(int termNum, int docFreq)
           
 String toString()
           
protected  void visitTerm(TermsEnum te, int termNum)
           
 
Methods inherited from class org.apache.lucene.index.DocTermOrds
getOrdTermsEnum, isEmpty, iterator, lookupTerm, numTerms, ramUsedInBytes, uninvert
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

UnInvertedField

public UnInvertedField(String field,
                       SolrIndexSearcher searcher)
                throws IOException
Throws:
IOException
Method Detail

visitTerm

protected void visitTerm(TermsEnum te,
                         int termNum)
                  throws IOException
Overrides:
visitTerm in class DocTermOrds
Throws:
IOException

setActualDocFreq

protected void setActualDocFreq(int termNum,
                                int docFreq)
Overrides:
setActualDocFreq in class DocTermOrds

memSize

public long memSize()

getNumTerms

public int getNumTerms()

getCounts

public NamedList<Integer> getCounts(SolrIndexSearcher searcher,
                                    DocSet baseDocs,
                                    int offset,
                                    int limit,
                                    Integer mincount,
                                    boolean missing,
                                    String sort,
                                    String prefix)
                             throws IOException
Throws:
IOException

getStats

public StatsValues getStats(SolrIndexSearcher searcher,
                            DocSet baseDocs,
                            String[] facet)
                     throws IOException
Collect statistics about the UninvertedField. Code is very similar to getCounts(org.apache.solr.search.SolrIndexSearcher, org.apache.solr.search.DocSet, int, int, Integer, boolean, String, String) It can be used to calculate stats on multivalued fields.

This method is mainly used by the StatsComponent.

Parameters:
searcher - The Searcher to use to gather the statistics
baseDocs - The DocSet to gather the stats on
facet - One or more fields to facet on.
Returns:
The StatsValues collected
Throws:
IOException - If there is a low-level I/O error.

toString

public String toString()
Overrides:
toString in class Object

getUnInvertedField

public static UnInvertedField getUnInvertedField(String field,
                                                 SolrIndexSearcher searcher)
                                          throws IOException
Throws:
IOException


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.