public class TFTermPruningPolicy extends TermPruningPolicy
Larger threshold value will produce a smaller index.
See TermPruningPolicy
for size vs performance considerations.
This implementation uses simple term frequency thresholds to remove all postings from documents where a given term occurs rarely (i.e. its TF in a document is smaller than the threshold).
Threshold values in this method are expressed as absolute term frequencies.
Modifier and Type | Field and Description |
---|---|
protected int |
curThr |
protected int |
defThreshold |
protected Map<String,Integer> |
thresholds |
fieldFlags, in
DEL_ALL, DEL_PAYLOADS, DEL_POSTINGS, DEL_STORED, DEL_VECTOR
Constructor and Description |
---|
TFTermPruningPolicy(IndexReader in,
Map<String,Integer> fieldFlags,
Map<String,Integer> thresholds,
int defThreshold) |
Modifier and Type | Method and Description |
---|---|
void |
initPositionsTerm(TermPositions in,
Term t)
Called when moving
TermPositions to a new Term . |
boolean |
pruneAllPositions(TermPositions termPositions,
Term t)
Prune all postings per term (invoked once per term per doc)
|
int |
pruneSomePositions(int docNum,
int[] positions,
Term curTerm)
Prune some postings per term (invoked once per term per doc).
|
boolean |
pruneTermEnum(TermEnum te)
Pruning of all postings for a term (invoked once per term).
|
int |
pruneTermVectorTerms(int docNumber,
String field,
String[] terms,
int[] freqs,
TermFreqVector tfv)
Pruning of individual terms in term vectors.
|
pruneAllFieldPostings, prunePayload, pruneWholeTermVector
public boolean pruneTermEnum(TermEnum te) throws IOException
TermPruningPolicy
pruneTermEnum
in class TermPruningPolicy
te
- positioned term enum.IOException
public void initPositionsTerm(TermPositions in, Term t) throws IOException
TermPruningPolicy
TermPositions
to a new Term
.initPositionsTerm
in class TermPruningPolicy
in
- input term positionst
- current termIOException
public boolean pruneAllPositions(TermPositions termPositions, Term t) throws IOException
TermPruningPolicy
pruneAllPositions
in class TermPruningPolicy
termPositions
- positioned term positions. Implementations MUST NOT
advance this by calling TermPositions
methods that advance either
the position pointer (next, skipTo) or term pointer (seek).t
- current termIOException
public int pruneTermVectorTerms(int docNumber, String field, String[] terms, int[] freqs, TermFreqVector tfv) throws IOException
TermPruningPolicy
pruneTermVectorTerms
in class TermPruningPolicy
docNumber
- document numberfield
- field nameterms
- array of termsfreqs
- array of term frequenciestfv
- the original term frequency vectorIOException
public int pruneSomePositions(int docNum, int[] positions, Term curTerm)
TermPruningPolicy
pruneSomePositions
in class TermPruningPolicy
docNum
- current document numberpositions
- original term positions in the document (and indirectly
term frequency)curTerm
- current term