org.apache.lucene.search.postingshighlight
Class PostingsHighlighter

java.lang.Object
  extended by org.apache.lucene.search.postingshighlight.PostingsHighlighter

public class PostingsHighlighter
extends Object

Simple highlighter that does not analyze fields nor use term vectors. Instead it requires FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a BreakIterator to find passages in the text; by default it breaks using getSentenceInstance(Locale.ROOT). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

   // configure field with offsets at index time
   FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
   offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
   Field body = new Field("body", "foobar", offsetsType);

   // retrieve highlights at query time 
   PostingsHighlighter highlighter = new PostingsHighlighter();
   Query query = new TermQuery(new Term("body", "highlighting"));
   TopDocs topDocs = searcher.search(query, n);
   String highlights[] = highlighter.highlight("body", query, searcher, topDocs);
 

This is thread-safe, and can be used across different readers.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary
static int DEFAULT_MAX_LENGTH
          Default maximum content size to process.
 
Constructor Summary
PostingsHighlighter()
          Creates a new highlighter with default parameters.
PostingsHighlighter(int maxLength)
          Creates a new highlighter, specifying maximum content length.
 
Method Summary
protected  BreakIterator getBreakIterator(String field)
          Returns the BreakIterator to use for dividing text into passages.
protected  Passage[] getEmptyHighlight(String fieldName, BreakIterator bi, int maxPassages)
          Called to summarize a document when no hits were found.
protected  PassageFormatter getFormatter(String field)
          Returns the PassageFormatter to use for formatting passages into highlighted snippets.
protected  char getMultiValuedSeparator(String field)
          Returns the logical separator between values for multi-valued fields.
protected  PassageScorer getScorer(String field)
          Returns the PassageScorer to use for ranking passages.
 String[] highlight(String field, Query query, IndexSearcher searcher, TopDocs topDocs)
          Highlights the top passages from a single field.
 String[] highlight(String field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages)
          Highlights the top-N passages from a single field.
 Map<String,String[]> highlightFields(String[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn)
          Highlights the top-N passages from multiple fields, for the provided int[] docids.
 Map<String,String[]> highlightFields(String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs)
          Highlights the top passages from multiple fields.
 Map<String,String[]> highlightFields(String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int[] maxPassages)
          Highlights the top-N passages from multiple fields.
protected  String[][] loadFieldValues(IndexSearcher searcher, String[] fields, int[] docids, int maxLength)
          Loads the String values for each field X docID to be highlighted.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MAX_LENGTH

public static final int DEFAULT_MAX_LENGTH
Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content

See Also:
Constant Field Values
Constructor Detail

PostingsHighlighter

public PostingsHighlighter()
Creates a new highlighter with default parameters.


PostingsHighlighter

public PostingsHighlighter(int maxLength)
Creates a new highlighter, specifying maximum content length.

Parameters:
maxLength - maximum content size to process.
Throws:
IllegalArgumentException - if maxLength is negative or Integer.MAX_VALUE
Method Detail

getBreakIterator

protected BreakIterator getBreakIterator(String field)
Returns the BreakIterator to use for dividing text into passages. This returns BreakIterator.getSentenceInstance(Locale) by default; subclasses can override to customize.


getFormatter

protected PassageFormatter getFormatter(String field)
Returns the PassageFormatter to use for formatting passages into highlighted snippets. This returns a new PassageFormatter by default; subclasses can override to customize.


getScorer

protected PassageScorer getScorer(String field)
Returns the PassageScorer to use for ranking passages. This returns a new PassageScorer by default; subclasses can override to customize.


highlight

public String[] highlight(String field,
                          Query query,
                          IndexSearcher searcher,
                          TopDocs topDocs)
                   throws IOException
Highlights the top passages from a single field.

Parameters:
field - field name to highlight. Must have a stored string value and also be indexed with offsets.
query - query to highlight.
searcher - searcher that was previously used to execute the query.
topDocs - TopDocs containing the summary result documents to highlight.
Returns:
Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first sentence for the field will be returned.
Throws:
IOException - if an I/O error occurred during processing
IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

highlight

public String[] highlight(String field,
                          Query query,
                          IndexSearcher searcher,
                          TopDocs topDocs,
                          int maxPassages)
                   throws IOException
Highlights the top-N passages from a single field.

Parameters:
field - field name to highlight. Must have a stored string value and also be indexed with offsets.
query - query to highlight.
searcher - searcher that was previously used to execute the query.
topDocs - TopDocs containing the summary result documents to highlight.
maxPassages - The maximum number of top-N ranked passages used to form the highlighted snippets.
Returns:
Array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first maxPassages sentences from the field will be returned.
Throws:
IOException - if an I/O error occurred during processing
IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

highlightFields

public Map<String,String[]> highlightFields(String[] fields,
                                            Query query,
                                            IndexSearcher searcher,
                                            TopDocs topDocs)
                                     throws IOException
Highlights the top passages from multiple fields.

Conceptually, this behaves as a more efficient form of:

 Map m = new HashMap();
 for (String field : fields) {
   m.put(field, highlight(field, query, searcher, topDocs));
 }
 return m;
 

Parameters:
fields - field names to highlight. Must have a stored string value and also be indexed with offsets.
query - query to highlight.
searcher - searcher that was previously used to execute the query.
topDocs - TopDocs containing the summary result documents to highlight.
Returns:
Map keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first sentence from the field will be returned.
Throws:
IOException - if an I/O error occurred during processing
IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

highlightFields

public Map<String,String[]> highlightFields(String[] fields,
                                            Query query,
                                            IndexSearcher searcher,
                                            TopDocs topDocs,
                                            int[] maxPassages)
                                     throws IOException
Highlights the top-N passages from multiple fields.

Conceptually, this behaves as a more efficient form of:

 Map m = new HashMap();
 for (String field : fields) {
   m.put(field, highlight(field, query, searcher, topDocs, maxPassages));
 }
 return m;
 

Parameters:
fields - field names to highlight. Must have a stored string value and also be indexed with offsets.
query - query to highlight.
searcher - searcher that was previously used to execute the query.
topDocs - TopDocs containing the summary result documents to highlight.
maxPassages - The maximum number of top-N ranked passages per-field used to form the highlighted snippets.
Returns:
Map keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first maxPassages sentences from the field will be returned.
Throws:
IOException - if an I/O error occurred during processing
IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

highlightFields

public Map<String,String[]> highlightFields(String[] fieldsIn,
                                            Query query,
                                            IndexSearcher searcher,
                                            int[] docidsIn,
                                            int[] maxPassagesIn)
                                     throws IOException
Highlights the top-N passages from multiple fields, for the provided int[] docids.

Parameters:
fieldsIn - field names to highlight. Must have a stored string value and also be indexed with offsets.
query - query to highlight.
searcher - searcher that was previously used to execute the query.
docidsIn - containing the document IDs to highlight.
maxPassagesIn - The maximum number of top-N ranked passages per-field used to form the highlighted snippets.
Returns:
Map keyed on field name, containing the array of formatted snippets corresponding to the documents in topDocs. If no highlights were found for a document, the first maxPassages from the field will be returned.
Throws:
IOException - if an I/O error occurred during processing
IllegalArgumentException - if field was indexed without FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

loadFieldValues

protected String[][] loadFieldValues(IndexSearcher searcher,
                                     String[] fields,
                                     int[] docids,
                                     int maxLength)
                              throws IOException
Loads the String values for each field X docID to be highlighted. By default this loads from stored fields, but a subclass can change the source. This method should allocate the String[fields.length][docids.length] and fill all values. The returned Strings must be identical to what was indexed.

Throws:
IOException

getMultiValuedSeparator

protected char getMultiValuedSeparator(String field)
Returns the logical separator between values for multi-valued fields. The default value is a space character, which means passages can span across values, but a subclass can override, for example with U+2029 PARAGRAPH SEPARATOR (PS) if each value holds a discrete passage for highlighting.


getEmptyHighlight

protected Passage[] getEmptyHighlight(String fieldName,
                                      BreakIterator bi,
                                      int maxPassages)
Called to summarize a document when no hits were found. By default this just returns the first maxPassages sentences; subclasses can override to customize.



Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.