CarmelUniformTermPruningPolicy (Lucene 3.6.1 API)

java.lang.Object
- org.apache.lucene.index.pruning.PruningPolicy
- - org.apache.lucene.index.pruning.TermPruningPolicy
  - - org.apache.lucene.index.pruning.CarmelUniformTermPruningPolicy

```
public class CarmelUniformTermPruningPolicy
extends TermPruningPolicy
```
Enhanced implementation of Carmel Uniform Pruning,
TermPositions whose in-document frequency is below a specified threshold
See CarmelTopKTermPruningPolicy for link to the paper describing this policy. are pruned.
Conclusions of that paper indicate that it's best to compute per-term thresholds, as we do in CarmelTopKTermPruningPolicy. However for large indexes with a large number of terms that method might be too slow, and the (enhanced) uniform approach implemented here may will be faster, although it might produce inferior search quality.
This implementation enhances the Carmel uniform pruning approach, as it allows to specify three levels of thresholds:
- one default threshold - globally (for terms in all fields)
- threshold per field
- threshold per term
These thresholds are applied so that always the most specific one takes precedence: first a per-term threshold is used if present, then per-field threshold if present, and finally the default threshold.
Threshold are maintained in a map, keyed by either field names or terms in field:text format. precedence of these values is the following:
Thresholds in this method of pruning are expressed as the percentage of the top-N scoring documents per term that are retained. The list of top-N documents is established by using a regular IndexSearcher and Similarity to run a simple TermQuery.
Smaller threshold value will produce a smaller index. See TermPruningPolicy for size vs performance considerations.
For indexes with a large number of terms this policy might be still too slow, since it issues a term query for each term in the index. In such situations, the term frequency pruning approach in TFTermPruningPolicy will be faster, though it might produce inferior search quality.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class CarmelUniformTermPruningPolicy.ByDocComparator

Nested Classes
Modifier and Type	Class and Description
`static class`	`CarmelUniformTermPruningPolicy.ByDocComparator`

Field Summary
- Fields inherited from class org.apache.lucene.index.pruning.TermPruningPolicy
  fieldFlags, in
- Fields inherited from class org.apache.lucene.index.pruning.PruningPolicy
  DEL_ALL, DEL_PAYLOADS, DEL_POSTINGS, DEL_STORED, DEL_VECTOR

Constructor Summary

Constructors
Constructor and Description
`CarmelUniformTermPruningPolicy(org.apache.lucene.index.IndexReader in, Map<String,Integer> fieldFlags, Map<String,Float> thresholds, float defThreshold, org.apache.lucene.search.Similarity sim)`

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`initPositionsTerm(org.apache.lucene.index.TermPositions tp, org.apache.lucene.index.Term t)` Called when moving `TermPositions` to a new `Term`.
`boolean`	`pruneAllPositions(org.apache.lucene.index.TermPositions termPositions, org.apache.lucene.index.Term t)` Prune all postings per term (invoked once per term per doc)
`int`	`pruneSomePositions(int docNum, int[] positions, org.apache.lucene.index.Term curTerm)` Prune some postings per term (invoked once per term per doc).
`boolean`	`pruneTermEnum(org.apache.lucene.index.TermEnum te)` Pruning of all postings for a term (invoked once per term).
`int`	`pruneTermVectorTerms(int docNumber, String field, String[] terms, int[] freqs, org.apache.lucene.index.TermFreqVector tfv)` Pruning of individual terms in term vectors.

Methods inherited from class org.apache.lucene.index.pruning.TermPruningPolicy
pruneAllFieldPostings, prunePayload, pruneWholeTermVector

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - CarmelUniformTermPruningPolicy
```
public CarmelUniformTermPruningPolicy(org.apache.lucene.index.IndexReader in,
                              Map<String,Integer> fieldFlags,
                              Map<String,Float> thresholds,
                              float defThreshold,
                              org.apache.lucene.search.Similarity sim)
```
- Method Detail
  - pruneTermEnum
```
public boolean pruneTermEnum(org.apache.lucene.index.TermEnum te)
                      throws IOException
```
    Description copied from class: TermPruningPolicy
    
    Pruning of all postings for a term (invoked once per term).
    
    Specified by:
    
    pruneTermEnum in class TermPruningPolicy
    
    Parameters:
    te - positioned term enum.
    
    Returns:
    true if all postings for this term should be removed, false otherwise.
    
    Throws:
    
    IOException
  - initPositionsTerm
```
public void initPositionsTerm(org.apache.lucene.index.TermPositions tp,
                     org.apache.lucene.index.Term t)
                       throws IOException
```
    Description copied from class: TermPruningPolicy
    
    Called when moving TermPositions to a new Term.
    
    Specified by:
    
    initPositionsTerm in class TermPruningPolicy
    
    Parameters:
    tp - input term positions
    t - current term
    
    Throws:
    
    IOException
  - pruneAllPositions
```
public boolean pruneAllPositions(org.apache.lucene.index.TermPositions termPositions,
                        org.apache.lucene.index.Term t)
                          throws IOException
```
    Description copied from class: TermPruningPolicy
    
    Prune all postings per term (invoked once per term per doc)
    
    Specified by:
    
    pruneAllPositions in class TermPruningPolicy
    
    Parameters:
    termPositions - positioned term positions. Implementations MUST NOT advance this by calling TermPositions methods that advance either the position pointer (next, skipTo) or term pointer (seek).
    t - current term
    
    Returns:
    true if the current posting should be removed, false otherwise.
    
    Throws:
    
    IOException
  - pruneTermVectorTerms
```
public int pruneTermVectorTerms(int docNumber,
                       String field,
                       String[] terms,
                       int[] freqs,
                       org.apache.lucene.index.TermFreqVector tfv)
                         throws IOException
```
    Description copied from class: TermPruningPolicy
    
    Pruning of individual terms in term vectors.
    
    Specified by:
    
    pruneTermVectorTerms in class TermPruningPolicy
    
    Parameters:
    docNumber - document number
    field - field name
    terms - array of terms
    freqs - array of term frequencies
    tfv - the original term frequency vector
    
    Returns:
    0 if no terms are to be removed, positive number to indicate how many terms need to be removed. The same number of entries in the terms array must be set to null to indicate which terms to remove.
    
    Throws:
    
    IOException
  - pruneSomePositions
```
public int pruneSomePositions(int docNum,
                     int[] positions,
                     org.apache.lucene.index.Term curTerm)
```
    Description copied from class: TermPruningPolicy
    
    Prune some postings per term (invoked once per term per doc).
    
    Specified by:
    
    pruneSomePositions in class TermPruningPolicy
    
    Parameters:
    docNum - current document number
    positions - original term positions in the document (and indirectly term frequency)
    curTerm - current term
    
    Returns:
    0 if no postings are to be removed, or positive number to indicate how many postings need to be removed. The same number of entries in the positions array must be set to -1 to indicate which positions to remove.

Class CarmelUniformTermPruningPolicy

Nested Class Summary

Field Summary

Fields inherited from class org.apache.lucene.index.pruning.TermPruningPolicy

Fields inherited from class org.apache.lucene.index.pruning.PruningPolicy

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.index.pruning.TermPruningPolicy

Methods inherited from class java.lang.Object

Constructor Detail

CarmelUniformTermPruningPolicy

Method Detail

pruneTermEnum

initPositionsTerm

pruneAllPositions

pruneTermVectorTerms

pruneSomePositions