

PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 
java.lang.Object org.apache.lucene.search.similarities.Similarity org.apache.lucene.search.similarities.TFIDFSimilarity org.apache.lucene.search.similarities.DefaultSimilarity
public class DefaultSimilarity
Expert: Default scoring implementation which encodes
norm values as a single byte before being stored. At search time,
the norm byte value is read from the index
directory
and
decoded
back to a float norm value.
This encoding/decoding, while reducing index size, comes with the price of
precision loss  it is not guaranteed that decode(encode(x)) = x. For
instance, decode(encode(0.89)) = 0.75.
Compression of norm values to a single byte saves memory at search time, because once a field is referenced at search time, its norms  for all documents  are maintained in memory.
The rationale supporting such lossy compression of norm values is that given
the difficulty (and inaccuracy) of users to express their true information
need by a query, only big differences matter.
Last, note that search time is too late to modify this norm part of
scoring, e.g. by using a different Similarity
for search.
Nested Class Summary 

Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity 

Similarity.SimScorer, Similarity.SimWeight 
Field Summary  

protected boolean 
discountOverlaps
True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length. 
Constructor Summary  

DefaultSimilarity()
Sole constructor: parameterfree 
Method Summary  

float 
coord(int overlap,
int maxOverlap)
Implemented as overlap / maxOverlap . 
float 
decodeNormValue(long norm)
Decodes the norm value, assuming it is a single byte. 
long 
encodeNormValue(float f)
Encodes a normalization factor for storage in an index. 
boolean 
getDiscountOverlaps()
Returns true if overlap tokens are discounted from the document's length. 
float 
idf(long docFreq,
long numDocs)
Implemented as log(numDocs/(docFreq+1)) + 1 . 
float 
lengthNorm(FieldInvertState state)
Implemented as state.getBoost()*lengthNorm(numTerms) , where
numTerms is FieldInvertState.getLength() if setDiscountOverlaps(boolean) is false, else it's FieldInvertState.getLength()  FieldInvertState.getNumOverlap() . 
float 
queryNorm(float sumOfSquaredWeights)
Implemented as 1/sqrt(sumOfSquaredWeights) . 
float 
scorePayload(int doc,
int start,
int end,
BytesRef payload)
The default implementation returns 1 
void 
setDiscountOverlaps(boolean v)
Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. 
float 
sloppyFreq(int distance)
Implemented as 1 / (distance + 1) . 
float 
tf(float freq)
Implemented as sqrt(freq) . 
String 
toString()

Methods inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity 

computeNorm, computeWeight, idfExplain, idfExplain, simScorer 
Methods inherited from class java.lang.Object 

clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait 
Field Detail 

protected boolean discountOverlaps
Constructor Detail 

public DefaultSimilarity()
Method Detail 

public float coord(int overlap, int maxOverlap)
overlap / maxOverlap
.
coord
in class TFIDFSimilarity
overlap
 the number of query terms matched in the documentmaxOverlap
 the total number of terms in the query
public float queryNorm(float sumOfSquaredWeights)
1/sqrt(sumOfSquaredWeights)
.
queryNorm
in class TFIDFSimilarity
sumOfSquaredWeights
 the sum of the squares of query term weights
public final long encodeNormValue(float f)
The encoding uses a threebit mantissa, a fivebit exponent, and the zeroexponent point at 15, thus representing values from around 7x10^9 to 2x10^9 with about one significant decimal digit of accuracy. Zero is also represented. Negative numbers are rounded up to zero. Values too large to represent are rounded down to the largest representable value. Positive values too small to represent are rounded up to the smallest positive representable value.
encodeNormValue
in class TFIDFSimilarity
Field.setBoost(float)
,
SmallFloat
public final float decodeNormValue(long norm)
decodeNormValue
in class TFIDFSimilarity
encodeNormValue(float)
public float lengthNorm(FieldInvertState state)
state.getBoost()*lengthNorm(numTerms)
, where
numTerms
is FieldInvertState.getLength()
if setDiscountOverlaps(boolean)
is false, else it's FieldInvertState.getLength()
 FieldInvertState.getNumOverlap()
.
lengthNorm
in class TFIDFSimilarity
state
 statistics of the current field (such as length, boost, etc)
public float tf(float freq)
sqrt(freq)
.
tf
in class TFIDFSimilarity
freq
 the frequency of a term within a document
public float sloppyFreq(int distance)
1 / (distance + 1)
.
sloppyFreq
in class TFIDFSimilarity
distance
 the edit distance of this sloppy phrase match
PhraseQuery.setSlop(int)
public float scorePayload(int doc, int start, int end, BytesRef payload)
1
scorePayload
in class TFIDFSimilarity
doc
 The docId currently being scored.start
 The start position of the payloadend
 The end position of the payloadpayload
 The payload byte array to be scored
public float idf(long docFreq, long numDocs)
log(numDocs/(docFreq+1)) + 1
.
idf
in class TFIDFSimilarity
docFreq
 the number of documents which contain the termnumDocs
 the total number of documents in the collection
public void setDiscountOverlaps(boolean v)
TFIDFSimilarity.computeNorm(org.apache.lucene.index.FieldInvertState)
public boolean getDiscountOverlaps()
setDiscountOverlaps(boolean)
public String toString()
toString
in class Object


PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 