SimilarityBase (Lucene 4.2.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.search.similarities
Class SimilarityBase

java.lang.Object
  org.apache.lucene.search.similarities.Similarity
      org.apache.lucene.search.similarities.SimilarityBase

Direct Known Subclasses:: DFRSimilarity, IBSimilarity, LMSimilarity

public abstract class SimilarityBase
extends Similarity
extends Similarity

A subclass of Similarity that provides a simplified API for its descendants. Subclasses are only required to implement the score(org.apache.lucene.search.similarities.BasicStats, float, float) and toString() methods. Implementing explain(Explanation, BasicStats, int, float, float) is optional, inasmuch as SimilarityBase already provides a basic explanation of the score and the term frequency. However, implementers of a subclass are encouraged to include as much detail about the scoring method as possible.

Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
`Similarity.ExactSimScorer, Similarity.SimWeight, Similarity.SloppySimScorer`

Field Summary
`protected boolean`	`discountOverlaps` True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Constructor Summary
`SimilarityBase()` Sole constructor.

Method Summary
`long`	`computeNorm(FieldInvertState state)` Encodes the document length in the same way as `TFIDFSimilarity`.
`Similarity.SimWeight`	`computeWeight(float queryBoost, CollectionStatistics collectionStats, TermStatistics... termStats)` Compute any collection-level weight (e.g.
`protected float`	`decodeNormValue(byte norm)` Decodes a normalization factor (document length) stored in an index.
`protected byte`	`encodeNormValue(float boost, float length)` Encodes the length to a byte via SmallFloat.
`Similarity.ExactSimScorer`	`exactSimScorer(Similarity.SimWeight stats, AtomicReaderContext context)` Creates a new `Similarity.ExactSimScorer` to score matching documents from a segment of the inverted index.
`protected Explanation`	`explain(BasicStats stats, int doc, Explanation freq, float docLen)` Explains the score.
`protected void`	`explain(Explanation expl, BasicStats stats, int doc, float freq, float docLen)` Subclasses should implement this method to explain the score.
`protected void`	`fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats)` Fills all member fields defined in `BasicStats` in `stats`.
`boolean`	`getDiscountOverlaps()` Returns true if overlap tokens are discounted from the document's length.
`static double`	`log2(double x)` Returns the base two logarithm of `x`.
`protected BasicStats`	`newStats(String field, float queryBoost)` Factory method to return a custom stats object
`protected abstract float`	`score(BasicStats stats, float freq, float docLen)` Scores the document `doc`.
`void`	`setDiscountOverlaps(boolean v)` Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm.
`Similarity.SloppySimScorer`	`sloppySimScorer(Similarity.SimWeight stats, AtomicReaderContext context)` Creates a new `Similarity.SloppySimScorer` to score matching documents from a segment of the inverted index.
`abstract String`	`toString()` Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.

Methods inherited from class org.apache.lucene.search.similarities.Similarity
`coord, queryNorm`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Field Detail

discountOverlaps

protected boolean discountOverlaps

True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Constructor Detail

SimilarityBase

public SimilarityBase()

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Method Detail

setDiscountOverlaps

public void setDiscountOverlaps(boolean v)

Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.

See Also:: computeNorm(org.apache.lucene.index.FieldInvertState)
WARNING: This API is experimental and might change in incompatible ways in the next release.

getDiscountOverlaps

public boolean getDiscountOverlaps()

Returns true if overlap tokens are discounted from the document's length.

See Also:: setDiscountOverlaps(boolean)

computeWeight

public final Similarity.SimWeight computeWeight(float queryBoost,
                                                CollectionStatistics collectionStats,
                                                TermStatistics... termStats)

Description copied from class: Similarity

Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.

Specified by:: computeWeight in class Similarity

Parameters:: queryBoost - the query-time boost.; collectionStats - collection-level statistics, such as the number of tokens in the collection.; termStats - term-level statistics, such as the document frequency of a term across the collection.
Returns:: SimWeight object with the information this Similarity needs to score a query.

newStats

protected BasicStats newStats(String field,
                              float queryBoost)

Factory method to return a custom stats object

fillBasicStats

protected void fillBasicStats(BasicStats stats,
                              CollectionStatistics collectionStats,
                              TermStatistics termStats)

Fills all member fields defined in BasicStats in stats. Subclasses can override this method to fill additional stats.

score

protected abstract float score(BasicStats stats,
                               float freq,
                               float docLen)

Scores the document doc.

Subclasses must apply their scoring formula in this class.

Parameters:: stats - the corpus level statistics.; freq - the term frequency.; docLen - the document length.
Returns:: the score.

explain

protected void explain(Explanation expl,
                       BasicStats stats,
                       int doc,
                       float freq,
                       float docLen)

Subclasses should implement this method to explain the score. expl already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.

The default implementation does nothing.

Parameters:: expl - the explanation to extend with details.; stats - the corpus level statistics.; doc - the document id.; freq - the term frequency.; docLen - the document length.

explain

protected Explanation explain(BasicStats stats,
                              int doc,
                              Explanation freq,
                              float docLen)

Explains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via the score(BasicStats, float, float) method) and the explanation for the term frequency. Subclasses content with this format may add additional details in explain(Explanation, BasicStats, int, float, float).

Parameters:: stats - the corpus level statistics.; doc - the document id.; freq - the term frequency and its explanation.; docLen - the document length.
Returns:: the explanation.

exactSimScorer

public Similarity.ExactSimScorer exactSimScorer(Similarity.SimWeight stats,
                                                AtomicReaderContext context)
                                         throws IOException

Description copied from class: Similarity

Creates a new Similarity.ExactSimScorer to score matching documents from a segment of the inverted index.

Specified by:: exactSimScorer in class Similarity

Parameters:: stats - collection information from Similarity.computeWeight(float, CollectionStatistics, TermStatistics...); context - segment of the inverted index to be scored.
Returns:: ExactSimScorer for scoring documents across context
Throws:: IOException - if there is a low-level I/O error

sloppySimScorer

public Similarity.SloppySimScorer sloppySimScorer(Similarity.SimWeight stats,
                                                  AtomicReaderContext context)
                                           throws IOException

Description copied from class: Similarity

Creates a new Similarity.SloppySimScorer to score matching documents from a segment of the inverted index.

Specified by:: sloppySimScorer in class Similarity

Parameters:: stats - collection information from Similarity.computeWeight(float, CollectionStatistics, TermStatistics...); context - segment of the inverted index to be scored.
Returns:: SloppySimScorer for scoring documents across context
Throws:: IOException - if there is a low-level I/O error

toString

public abstract String toString()

Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.

Overrides:: toString in class Object

computeNorm

public long computeNorm(FieldInvertState state)

Encodes the document length in the same way as TFIDFSimilarity.

Specified by:: computeNorm in class Similarity

Parameters:: state - current processing state for this field
Returns:: computed norm value

decodeNormValue

protected float decodeNormValue(byte norm)

Decodes a normalization factor (document length) stored in an index.

See Also:: encodeNormValue(float,float)

encodeNormValue

protected byte encodeNormValue(float boost,
                               float length)

Encodes the length to a byte via SmallFloat.

log2

public static double log2(double x)

Returns the base two logarithm of x.

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.search.similarities Class SimilarityBase

discountOverlaps

SimilarityBase

setDiscountOverlaps

getDiscountOverlaps

computeWeight

newStats

fillBasicStats

score

explain

explain

exactSimScorer

sloppySimScorer

toString

computeNorm

decodeNormValue

encodeNormValue

log2

org.apache.lucene.search.similarities
Class SimilarityBase