public class TFTermPruningPolicy extends TermPruningPolicy
Larger threshold value will produce a smaller index.
See TermPruningPolicy for size vs performance considerations.
This implementation uses simple term frequency thresholds to remove all postings from documents where a given term occurs rarely (i.e. its TF in a document is smaller than the threshold).
Threshold values in this method are expressed as absolute term frequencies.
| Modifier and Type | Field and Description |
|---|---|
protected int |
curThr |
protected int |
defThreshold |
protected Map<String,Integer> |
thresholds |
fieldFlags, inDEL_ALL, DEL_PAYLOADS, DEL_POSTINGS, DEL_STORED, DEL_VECTOR| Constructor and Description |
|---|
TFTermPruningPolicy(IndexReader in,
Map<String,Integer> fieldFlags,
Map<String,Integer> thresholds,
int defThreshold) |
| Modifier and Type | Method and Description |
|---|---|
void |
initPositionsTerm(TermPositions in,
Term t)
Called when moving
TermPositions to a new Term. |
boolean |
pruneAllPositions(TermPositions termPositions,
Term t)
Prune all postings per term (invoked once per term per doc)
|
int |
pruneSomePositions(int docNum,
int[] positions,
Term curTerm)
Prune some postings per term (invoked once per term per doc).
|
boolean |
pruneTermEnum(TermEnum te)
Pruning of all postings for a term (invoked once per term).
|
int |
pruneTermVectorTerms(int docNumber,
String field,
String[] terms,
int[] freqs,
TermFreqVector tfv)
Pruning of individual terms in term vectors.
|
pruneAllFieldPostings, prunePayload, pruneWholeTermVectorpublic boolean pruneTermEnum(TermEnum te) throws IOException
TermPruningPolicypruneTermEnum in class TermPruningPolicyte - positioned term enum.IOExceptionpublic void initPositionsTerm(TermPositions in, Term t) throws IOException
TermPruningPolicyTermPositions to a new Term.initPositionsTerm in class TermPruningPolicyin - input term positionst - current termIOExceptionpublic boolean pruneAllPositions(TermPositions termPositions, Term t) throws IOException
TermPruningPolicypruneAllPositions in class TermPruningPolicytermPositions - positioned term positions. Implementations MUST NOT
advance this by calling TermPositions methods that advance either
the position pointer (next, skipTo) or term pointer (seek).t - current termIOExceptionpublic int pruneTermVectorTerms(int docNumber,
String field,
String[] terms,
int[] freqs,
TermFreqVector tfv)
throws IOException
TermPruningPolicypruneTermVectorTerms in class TermPruningPolicydocNumber - document numberfield - field nameterms - array of termsfreqs - array of term frequenciestfv - the original term frequency vectorIOExceptionpublic int pruneSomePositions(int docNum,
int[] positions,
Term curTerm)
TermPruningPolicypruneSomePositions in class TermPruningPolicydocNum - current document numberpositions - original term positions in the document (and indirectly
term frequency)curTerm - current term