org.apache.lucene.search.similarities
Class DefaultSimilarity

java.lang.Object
  extended by org.apache.lucene.search.similarities.Similarity
      extended by org.apache.lucene.search.similarities.TFIDFSimilarity
          extended by org.apache.lucene.search.similarities.DefaultSimilarity

public class DefaultSimilarity
extends TFIDFSimilarity

Expert: Default scoring implementation.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.ExactSimScorer, Similarity.SimWeight, Similarity.SloppySimScorer
 
Field Summary
protected  boolean discountOverlaps
          True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.
 
Constructor Summary
DefaultSimilarity()
          Sole constructor: parameter-free
 
Method Summary
 float coord(int overlap, int maxOverlap)
          Implemented as overlap / maxOverlap.
 boolean getDiscountOverlaps()
          Returns true if overlap tokens are discounted from the document's length.
 float idf(long docFreq, long numDocs)
          Implemented as log(numDocs/(docFreq+1)) + 1.
 float lengthNorm(FieldInvertState state)
          Implemented as state.getBoost()*lengthNorm(numTerms), where numTerms is FieldInvertState.getLength() if setDiscountOverlaps(boolean) is false, else it's FieldInvertState.getLength() - FieldInvertState.getNumOverlap().
 float queryNorm(float sumOfSquaredWeights)
          Implemented as 1/sqrt(sumOfSquaredWeights).
 float scorePayload(int doc, int start, int end, BytesRef payload)
          The default implementation returns 1
 void setDiscountOverlaps(boolean v)
          Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm.
 float sloppyFreq(int distance)
          Implemented as 1 / (distance + 1).
 float tf(float freq)
          Implemented as sqrt(freq).
 String toString()
           
 
Methods inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
computeNorm, computeWeight, decodeNormValue, encodeNormValue, exactSimScorer, idfExplain, idfExplain, sloppySimScorer, tf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

discountOverlaps

protected boolean discountOverlaps
True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Constructor Detail

DefaultSimilarity

public DefaultSimilarity()
Sole constructor: parameter-free

Method Detail

coord

public float coord(int overlap,
                   int maxOverlap)
Implemented as overlap / maxOverlap.

Specified by:
coord in class TFIDFSimilarity
Parameters:
overlap - the number of query terms matched in the document
maxOverlap - the total number of terms in the query
Returns:
a score factor based on term overlap with the query

queryNorm

public float queryNorm(float sumOfSquaredWeights)
Implemented as 1/sqrt(sumOfSquaredWeights).

Specified by:
queryNorm in class TFIDFSimilarity
Parameters:
sumOfSquaredWeights - the sum of the squares of query term weights
Returns:
a normalization factor for query weights

lengthNorm

public float lengthNorm(FieldInvertState state)
Implemented as state.getBoost()*lengthNorm(numTerms), where numTerms is FieldInvertState.getLength() if setDiscountOverlaps(boolean) is false, else it's FieldInvertState.getLength() - FieldInvertState.getNumOverlap().

Specified by:
lengthNorm in class TFIDFSimilarity
Parameters:
state - statistics of the current field (such as length, boost, etc)
Returns:
an index-time normalization value
WARNING: This API is experimental and might change in incompatible ways in the next release.

tf

public float tf(float freq)
Implemented as sqrt(freq).

Specified by:
tf in class TFIDFSimilarity
Parameters:
freq - the frequency of a term within a document
Returns:
a score factor based on a term's within-document frequency

sloppyFreq

public float sloppyFreq(int distance)
Implemented as 1 / (distance + 1).

Specified by:
sloppyFreq in class TFIDFSimilarity
Parameters:
distance - the edit distance of this sloppy phrase match
Returns:
the frequency increment for this match
See Also:
PhraseQuery.setSlop(int)

scorePayload

public float scorePayload(int doc,
                          int start,
                          int end,
                          BytesRef payload)
The default implementation returns 1

Specified by:
scorePayload in class TFIDFSimilarity
Parameters:
doc - The docId currently being scored.
start - The start position of the payload
end - The end position of the payload
payload - The payload byte array to be scored
Returns:
An implementation dependent float to be used as a scoring factor

idf

public float idf(long docFreq,
                 long numDocs)
Implemented as log(numDocs/(docFreq+1)) + 1.

Specified by:
idf in class TFIDFSimilarity
Parameters:
docFreq - the number of documents which contain the term
numDocs - the total number of documents in the collection
Returns:
a score factor based on the term's document frequency

setDiscountOverlaps

public void setDiscountOverlaps(boolean v)
Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.

See Also:
TFIDFSimilarity.computeNorm(org.apache.lucene.index.FieldInvertState)
WARNING: This API is experimental and might change in incompatible ways in the next release.

getDiscountOverlaps

public boolean getDiscountOverlaps()
Returns true if overlap tokens are discounted from the document's length.

See Also:
setDiscountOverlaps(boolean)

toString

public String toString()
Overrides:
toString in class Object


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.