DefaultSimilarity (Lucene 5.0.0 API)

java.lang.Object
- org.apache.lucene.search.similarities.Similarity
- - org.apache.lucene.search.similarities.TFIDFSimilarity
  - - org.apache.lucene.search.similarities.DefaultSimilarity

```
public class DefaultSimilarity
extends TFIDFSimilarity
```
Expert: Default scoring implementation which encodes norm values as a single byte before being stored. At search time, the norm byte value is read from the index directory and decoded back to a float norm value. This encoding/decoding, while reducing index size, comes with the price of precision loss - it is not guaranteed that decode(encode(x)) = x. For instance, decode(encode(0.89)) = 0.75.
Compression of norm values to a single byte saves memory at search time, because once a field is referenced at search time, its norms - for all documents - are maintained in memory.
The rationale supporting such lossy compression of norm values is that given the difficulty (and inaccuracy) of users to express their true information need by a query, only big differences matter.

Last, note that search time is too late to modify this norm part of scoring, e.g. by using a different Similarity for search.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
  Similarity.SimScorer, Similarity.SimWeight

Field Summary

Fields
Modifier and Type	Field and Description
`protected boolean`	`discountOverlaps` True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Constructor Summary

Constructors
Constructor and Description

DefaultSimilarity()
Sole constructor: parameter-free

Constructors
Constructor and Description
`DefaultSimilarity()` Sole constructor: parameter-free

Method Summary

Methods
Modifier and Type	Method and Description
`float`	`coord(int overlap, int maxOverlap)` Implemented as `overlap / maxOverlap`.
`float`	`decodeNormValue(long norm)` Decodes the norm value, assuming it is a single byte.
`long`	`encodeNormValue(float f)` Encodes a normalization factor for storage in an index.
`boolean`	`getDiscountOverlaps()` Returns true if overlap tokens are discounted from the document's length.
`float`	`idf(long docFreq, long numDocs)` Implemented as `log(numDocs/(docFreq+1)) + 1`.
`float`	`lengthNorm(FieldInvertState state)` Implemented as `state.getBoost()*lengthNorm(numTerms)`, where `numTerms` is `FieldInvertState.getLength()` if `setDiscountOverlaps(boolean)` is false, else it's `FieldInvertState.getLength()` - `FieldInvertState.getNumOverlap()`.
`float`	`queryNorm(float sumOfSquaredWeights)` Implemented as `1/sqrt(sumOfSquaredWeights)`.
`float`	`scorePayload(int doc, int start, int end, BytesRef payload)` The default implementation returns `1`
`void`	`setDiscountOverlaps(boolean v)` Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm.
`float`	`sloppyFreq(int distance)` Implemented as `1 / (distance + 1)`.
`float`	`tf(float freq)` Implemented as `sqrt(freq)`.
`String`	`toString()`

Methods inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
computeNorm, computeWeight, idfExplain, idfExplain, simScorer

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - discountOverlaps
```
protected boolean discountOverlaps
```
    True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.
- Constructor Detail
  - DefaultSimilarity
```
public DefaultSimilarity()
```
    Sole constructor: parameter-free
- Method Detail
  - coord
```
public float coord(int overlap,
          int maxOverlap)
```
    Implemented as overlap / maxOverlap.
    
    Specified by:
    
    coord in class TFIDFSimilarity
    
    Parameters:
    overlap - the number of query terms matched in the document
    maxOverlap - the total number of terms in the query
    
    Returns:
    a score factor based on term overlap with the query
  - queryNorm
```
public float queryNorm(float sumOfSquaredWeights)
```
    Implemented as 1/sqrt(sumOfSquaredWeights).
    
    Specified by:
    
    queryNorm in class TFIDFSimilarity
    
    Parameters:
    sumOfSquaredWeights - the sum of the squares of query term weights
    
    Returns:
    a normalization factor for query weights
  - encodeNormValue
```
public final long encodeNormValue(float f)
```
    Encodes a normalization factor for storage in an index.
    The encoding uses a three-bit mantissa, a five-bit exponent, and the zero-exponent point at 15, thus representing values from around 7x10^9 to 2x10^-9 with about one significant decimal digit of accuracy. Zero is also represented. Negative numbers are rounded up to zero. Values too large to represent are rounded down to the largest representable value. Positive values too small to represent are rounded up to the smallest positive representable value.
    
    Specified by:
    
    encodeNormValue in class TFIDFSimilarity
    
    See Also:
    Field.setBoost(float), SmallFloat
  - decodeNormValue
```
public final float decodeNormValue(long norm)
```
    Decodes the norm value, assuming it is a single byte.
    
    Specified by:
    
    decodeNormValue in class TFIDFSimilarity
    
    See Also:
    encodeNormValue(float)
  - lengthNorm
```
public float lengthNorm(FieldInvertState state)
```
    Implemented as state.getBoost()*lengthNorm(numTerms), where numTerms is FieldInvertState.getLength() if setDiscountOverlaps(boolean) is false, else it's FieldInvertState.getLength() - FieldInvertState.getNumOverlap().
    
    Specified by:
    
    lengthNorm in class TFIDFSimilarity
    
    Parameters:
    state - statistics of the current field (such as length, boost, etc)
    
    Returns:
    an index-time normalization value
    WARNING: This API is experimental and might change in incompatible ways in the next release.
  - tf
```
public float tf(float freq)
```
    Implemented as sqrt(freq).
    
    Specified by:
    
    tf in class TFIDFSimilarity
    
    Parameters:
    freq - the frequency of a term within a document
    
    Returns:
    a score factor based on a term's within-document frequency
  - sloppyFreq
```
public float sloppyFreq(int distance)
```
    Implemented as 1 / (distance + 1).
    
    Specified by:
    
    sloppyFreq in class TFIDFSimilarity
    
    Parameters:
    distance - the edit distance of this sloppy phrase match
    
    Returns:
    the frequency increment for this match
    See Also:
    PhraseQuery.setSlop(int)
  - scorePayload
```
public float scorePayload(int doc,
                 int start,
                 int end,
                 BytesRef payload)
```
    The default implementation returns 1
    
    Specified by:
    
    scorePayload in class TFIDFSimilarity
    
    Parameters:
    doc - The docId currently being scored.
    start - The start position of the payload
    end - The end position of the payload
    payload - The payload byte array to be scored
    
    Returns:
    An implementation dependent float to be used as a scoring factor
  - idf
```
public float idf(long docFreq,
        long numDocs)
```
    Implemented as log(numDocs/(docFreq+1)) + 1.
    
    Specified by:
    
    idf in class TFIDFSimilarity
    
    Parameters:
    docFreq - the number of documents which contain the term
    numDocs - the total number of documents in the collection
    
    Returns:
    a score factor based on the term's document frequency
  - setDiscountOverlaps
```
public void setDiscountOverlaps(boolean v)
```
    Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.
    
    See Also:
    TFIDFSimilarity.computeNorm(org.apache.lucene.index.FieldInvertState)
    WARNING: This API is experimental and might change in incompatible ways in the next release.
  - getDiscountOverlaps
```
public boolean getDiscountOverlaps()
```
    Returns true if overlap tokens are discounted from the document's length.
    
    See Also:
    setDiscountOverlaps(boolean)
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object

Class DefaultSimilarity

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity

Methods inherited from class java.lang.Object

Field Detail

discountOverlaps

Constructor Detail

DefaultSimilarity

Method Detail

coord

queryNorm

encodeNormValue

decodeNormValue

lengthNorm

tf

sloppyFreq

scorePayload

idf

setDiscountOverlaps

getDiscountOverlaps

toString