org.apache.lucene.search.postingshighlight
Class PostingsHighlighter

java.lang.Object
  extended by org.apache.lucene.search.postingshighlight.PostingsHighlighter

public final class PostingsHighlighter
extends Object

Simple highlighter that does not analyze fields nor use term vectors. Instead it requires FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a BreakIterator to find passages in the text; by default it breaks using getSentenceInstance(Locale.ROOT). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

   // configure field with offsets at index time
   FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
   offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
   Field body = new Field("body", "foobar", offsetsType);

   // retrieve highlights at query time 
   PostingsHighlighter highlighter = new PostingsHighlighter();
   Query query = new TermQuery(new Term("body", "highlighting"));
   TopDocs topDocs = searcher.search(query, n);
   String highlights[] = highlighter.highlight("body", query, searcher, topDocs);
 

This is thread-safe, and can be used across different readers.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary
static int DEFAULT_MAX_LENGTH
          Default maximum content size to process.
 
Constructor Summary
PostingsHighlighter()
          Creates a new highlighter with default parameters.
PostingsHighlighter(int maxLength)
          Creates a new highlighter, specifying maximum content length.
PostingsHighlighter(int maxLength, BreakIterator breakIterator, PassageScorer scorer, PassageFormatter formatter)
          Creates a new highlighter with custom parameters.
 
Method Summary
 String[] highlight(String field, Query query, IndexSearcher searcher, TopDocs topDocs)
          Highlights the top passages from a single field.
 String[] highlight(String field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages)
          Highlights the top-N passages from a single field.
 Map<String,String[]> highlightFields(String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs)
          Highlights the top passages from multiple fields.
 Map<String,String[]> highlightFields(String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages)
          Highlights the top-N passages from multiple fields.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MAX_LENGTH

public static final int DEFAULT_MAX_LENGTH
Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content

See Also:
Constant Field Values
Constructor Detail

PostingsHighlighter

public PostingsHighlighter()
Creates a new highlighter with default parameters.


PostingsHighlighter

public PostingsHighlighter(int maxLength)
Creates a new highlighter, specifying maximum content length.

Parameters:
maxLength - maximum content size to process.
Throws:
IllegalArgumentException - if maxLength is negative or Integer.MAX_VALUE

PostingsHighlighter

public PostingsHighlighter(int maxLength,
                           BreakIterator breakIterator,
                           PassageScorer scorer,
                           PassageFormatter formatter)
Creates a new highlighter with custom parameters.

Parameters:
maxLength - maximum content size to process.
breakIterator - used for finding passage boundaries.
scorer - used for ranking passages.
formatter - used for formatting passages into highlighted snippets.
Throws:
IllegalArgumentException - if maxLength is negative or Integer.MAX_VALUE
Method Detail

highlight

public String[] highlight(String field,
                          Query query,
                          IndexSearcher searcher,
                          TopDocs topDocs)
                   throws IOException
Highlights the top passages from a single field.

Parameters:
field - field name to highlight. Must have a stored string value and also be indexed with offsets.
query - query to highlight.
searcher - searcher that was previously used to execute the query.
topDocs - TopDocs containing the summary result documents to highlight.
Returns:
Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, its value is null.
Throws:
IOException - if an I/O error occurred during processing
IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

highlight

public String[] highlight(String field,
                          Query query,
                          IndexSearcher searcher,
                          TopDocs topDocs,
                          int maxPassages)
                   throws IOException
Highlights the top-N passages from a single field.

Parameters:
field - field name to highlight. Must have a stored string value and also be indexed with offsets.
query - query to highlight.
searcher - searcher that was previously used to execute the query.
topDocs - TopDocs containing the summary result documents to highlight.
maxPassages - The maximum number of top-N ranked passages used to form the highlighted snippets.
Returns:
Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, its value is null.
Throws:
IOException - if an I/O error occurred during processing
IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

highlightFields

public Map<String,String[]> highlightFields(String[] fields,
                                            Query query,
                                            IndexSearcher searcher,
                                            TopDocs topDocs)
                                     throws IOException
Highlights the top passages from multiple fields.

Conceptually, this behaves as a more efficient form of:

 Map m = new HashMap();
 for (String field : fields) {
   m.put(field, highlight(field, query, searcher, topDocs));
 }
 return m;
 

Parameters:
fields - field names to highlight. Must have a stored string value and also be indexed with offsets.
query - query to highlight.
searcher - searcher that was previously used to execute the query.
topDocs - TopDocs containing the summary result documents to highlight.
Returns:
Map keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, its value is null.
Throws:
IOException - if an I/O error occurred during processing
IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

highlightFields

public Map<String,String[]> highlightFields(String[] fields,
                                            Query query,
                                            IndexSearcher searcher,
                                            TopDocs topDocs,
                                            int maxPassages)
                                     throws IOException
Highlights the top-N passages from multiple fields.

Conceptually, this behaves as a more efficient form of:

 Map m = new HashMap();
 for (String field : fields) {
   m.put(field, highlight(field, query, searcher, topDocs, maxPassages));
 }
 return m;
 

Parameters:
fields - field names to highlight. Must have a stored string value and also be indexed with offsets.
query - query to highlight.
searcher - searcher that was previously used to execute the query.
topDocs - TopDocs containing the summary result documents to highlight.
maxPassages - The maximum number of top-N ranked passages per-field used to form the highlighted snippets.
Returns:
Map keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, its value is null.
Throws:
IOException - if an I/O error occurred during processing
IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.