Package org.apache.lucene.search.function

Programmatic control over documents scores.

See:
          Description

Class Summary
ByteFieldSource Expert: obtains single byte field values from the FieldCache using getBytes() and makes those values available as other numeric types, casting as needed.
CustomScoreProvider An instance of this subclass should be returned by CustomScoreQuery.getCustomScoreProvider(org.apache.lucene.index.IndexReader), if you want to modify the custom score calculation of a CustomScoreQuery.
CustomScoreQuery Query that sets document score as a programmatic function of several (sub) scores: the score of its subQuery (any query) (optional) the score of its ValueSourceQuery (or queries).
DocValues Expert: represents field values as different types.
FieldCacheSource Expert: A base class for ValueSource implementations that retrieve values for a single field from the FieldCache.
FieldScoreQuery A query that scores each document as the value of the numeric input field.
FieldScoreQuery.Type Type of score field, indicating how field values are interpreted/parsed.
FloatFieldSource Expert: obtains float field values from the FieldCache using getFloats() and makes those values available as other numeric types, casting as needed.
IntFieldSource Expert: obtains int field values from the FieldCache using getInts() and makes those values available as other numeric types, casting as needed.
MultiValueSource Deprecated. This class is temporary, to ease the migration to segment-based searching.
OrdFieldSource Expert: obtains the ordinal of the field value from the default Lucene Fieldcache using getStringIndex().
ReverseOrdFieldSource Expert: obtains the ordinal of the field value from the default Lucene FieldCache using getStringIndex() and reverses the order.
ShortFieldSource Expert: obtains short field values from the FieldCache using getShorts() and makes those values available as other numeric types, casting as needed.
ValueSource Expert: source of values for basic function queries.
ValueSourceQuery Expert: A Query that sets the scores of document to the values obtained from a ValueSource.
 

Package org.apache.lucene.search.function Description

Programmatic control over documents scores.
The function package provides tight control over documents scores.
WARNING: The status of the search.function package is experimental. The APIs introduced here might change in the future and will not be supported anymore in such a case.
Two types of queries are available in this package:
  1. Custom Score queries - allowing to set the score of a matching document as a mathematical expression over scores of that document by contained (sub) queries.
  2. Field score queries - allowing to base the score of a document on numeric values of indexed fields.
 
Some possible uses of these queries:
  1. Normalizing the document scores by values indexed in a special field - for instance, experimenting with a different doc length normalization.
  2. Introducing some static scoring element, to the score of a document, - for instance using some topological attribute of the links to/from a document.
  3. Computing the score of a matching document as an arbitrary odd function of its score by a certain query.
Performance and Quality Considerations:
  1. When scoring by values of indexed fields, these values are loaded into memory. Unlike the regular scoring, where the required information is read from disk as necessary, here field values are loaded once and cached by Lucene in memory for further use, anticipating reuse by further queries. While all this is carefully cached with performance in mind, it is recommended to use these features only when the default Lucene scoring does not match your "special" application needs.
  2. Use only with carefully selected fields, because in most cases, search quality with regular Lucene scoring would outperform that of scoring by field values.
  3. Values of fields used for scoring should match. Do not apply on a field containing arbitrary (long) text. Do not mix values in the same field if that field is used for scoring.
  4. Smaller (shorter) field tokens means less RAM (something always desired). When using FieldScoreQuery, select the shortest FieldScoreQuery.Type that is sufficient for the used field values.
  5. Reusing IndexReaders/IndexSearchers is essential, because the caching of field tokens is based on an IndexReader. Whenever a new IndexReader is used, values currently in the cache cannot be used and new values must be loaded from disk. So replace/refresh readers/searchers in a controlled manner.
History and Credits:
Code sample:

Note: code snippets here should work, but they were never really compiled... so, tests sources under TestCustomScoreQuery, TestFieldScoreQuery and TestOrdValues may also be useful.

  1. Using field (byte) values to as scores:

    Indexing:

          f = new Field("score", "7", Field.Store.NO, Field.Index.UN_TOKENIZED);
          f.setOmitNorms(true);
          d1.add(f);
        

    Search:

          Query q = new FieldScoreQuery("score", FieldScoreQuery.Type.BYTE);
        
    Document d1 above would get a score of 7.
  2. Manipulating scores

    Dividing the original score of each document by a square root of its docid (just to demonstrate what it takes to manipulate scores this way)

          Query q = queryParser.parse("my query text");
          CustomScoreQuery customQ = new CustomScoreQuery(q) {
            public float customScore(int doc, float subQueryScore, float valSrcScore) {
              return subQueryScore / Math.sqrt(docid);
            }
          };
        

    For more informative debug info on the custom query, also override the name() method:

          CustomScoreQuery customQ = new CustomScoreQuery(q) {
            public float customScore(int doc, float subQueryScore, float valSrcScore) {
              return subQueryScore / Math.sqrt(docid);
            }
            public String name() {
              return "1/sqrt(docid)";
            }
          };
        

    Taking the square root of the original score and multiplying it by a "short field driven score", ie, the short value that was indexed for the scored doc in a certain field:

          Query q = queryParser.parse("my query text");
          FieldScoreQuery qf = new FieldScoreQuery("shortScore", FieldScoreQuery.Type.SHORT);
          CustomScoreQuery customQ = new CustomScoreQuery(q,qf) {
            public float customScore(int doc, float subQueryScore, float valSrcScore) {
              return Math.sqrt(subQueryScore) * valSrcScore;
            }
            public String name() {
              return "shortVal*sqrt(score)";
            }
          };
        



Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.