|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.search.postingshighlight.PostingsHighlighter
public class PostingsHighlighter
Simple highlighter that does not analyze fields nor use
term vectors. Instead it requires
FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
.
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual
passages as if they were documents in this corpus. It uses a BreakIterator
to find
passages in the text; by default it breaks using getSentenceInstance(Locale.ROOT)
. It then iterates in parallel (merge sorting by offset) through
the positions of all terms from the query, coalescing those hits that occur in a single passage
into a Passage
, and then scores each Passage using a separate PassageScorer
.
Passages are finally formatted into highlighted snippets with a PassageFormatter
.
WARNING: The code is very new and probably still has some exciting bugs!
Example usage:
// configure field with offsets at index time FieldType offsetsType = new FieldType(TextField.TYPE_STORED); offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); Field body = new Field("body", "foobar", offsetsType); // retrieve highlights at query time PostingsHighlighter highlighter = new PostingsHighlighter(); Query query = new TermQuery(new Term("body", "highlighting")); TopDocs topDocs = searcher.search(query, n); String highlights[] = highlighter.highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers.
Field Summary | |
---|---|
static int |
DEFAULT_MAX_LENGTH
Default maximum content size to process. |
Constructor Summary | |
---|---|
PostingsHighlighter()
Creates a new highlighter with default parameters. |
|
PostingsHighlighter(int maxLength)
Creates a new highlighter, specifying maximum content length. |
Method Summary | |
---|---|
protected BreakIterator |
getBreakIterator(String field)
Returns the BreakIterator to use for
dividing text into passages. |
protected Passage[] |
getEmptyHighlight(String fieldName,
BreakIterator bi,
int maxPassages)
Called to summarize a document when no hits were found. |
protected PassageFormatter |
getFormatter(String field)
Returns the PassageFormatter to use for
formatting passages into highlighted snippets. |
protected char |
getMultiValuedSeparator(String field)
Returns the logical separator between values for multi-valued fields. |
protected PassageScorer |
getScorer(String field)
Returns the PassageScorer to use for
ranking passages. |
String[] |
highlight(String field,
Query query,
IndexSearcher searcher,
TopDocs topDocs)
Highlights the top passages from a single field. |
String[] |
highlight(String field,
Query query,
IndexSearcher searcher,
TopDocs topDocs,
int maxPassages)
Highlights the top-N passages from a single field. |
Map<String,String[]> |
highlightFields(String[] fieldsIn,
Query query,
IndexSearcher searcher,
int[] docidsIn,
int[] maxPassagesIn)
Highlights the top-N passages from multiple fields, for the provided int[] docids. |
Map<String,String[]> |
highlightFields(String[] fields,
Query query,
IndexSearcher searcher,
TopDocs topDocs)
Highlights the top passages from multiple fields. |
Map<String,String[]> |
highlightFields(String[] fields,
Query query,
IndexSearcher searcher,
TopDocs topDocs,
int[] maxPassages)
Highlights the top-N passages from multiple fields. |
protected String[][] |
loadFieldValues(IndexSearcher searcher,
String[] fields,
int[] docids,
int maxLength)
Loads the String values for each field X docID to be highlighted. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int DEFAULT_MAX_LENGTH
Constructor Detail |
---|
public PostingsHighlighter()
public PostingsHighlighter(int maxLength)
maxLength
- maximum content size to process.
IllegalArgumentException
- if maxLength
is negative or Integer.MAX_VALUE
Method Detail |
---|
protected BreakIterator getBreakIterator(String field)
BreakIterator
to use for
dividing text into passages. This returns
BreakIterator.getSentenceInstance(Locale)
by default;
subclasses can override to customize.
protected PassageFormatter getFormatter(String field)
PassageFormatter
to use for
formatting passages into highlighted snippets. This
returns a new PassageFormatter
by default;
subclasses can override to customize.
protected PassageScorer getScorer(String field)
PassageScorer
to use for
ranking passages. This
returns a new PassageScorer
by default;
subclasses can override to customize.
public String[] highlight(String field, Query query, IndexSearcher searcher, TopDocs topDocs) throws IOException
field
- field name to highlight.
Must have a stored string value and also be indexed with offsets.query
- query to highlight.searcher
- searcher that was previously used to execute the query.topDocs
- TopDocs containing the summary result documents to highlight.
topDocs
.
If no highlights were found for a document, the
first sentence for the field will be returned.
IOException
- if an I/O error occurred during processing
IllegalArgumentException
- if field
was indexed without
FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
public String[] highlight(String field, Query query, IndexSearcher searcher, TopDocs topDocs, int maxPassages) throws IOException
field
- field name to highlight.
Must have a stored string value and also be indexed with offsets.query
- query to highlight.searcher
- searcher that was previously used to execute the query.topDocs
- TopDocs containing the summary result documents to highlight.maxPassages
- The maximum number of top-N ranked passages used to
form the highlighted snippets.
topDocs
.
If no highlights were found for a document, the
first maxPassages
sentences from the
field will be returned.
IOException
- if an I/O error occurred during processing
IllegalArgumentException
- if field
was indexed without
FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
public Map<String,String[]> highlightFields(String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs) throws IOException
Conceptually, this behaves as a more efficient form of:
Map m = new HashMap(); for (String field : fields) { m.put(field, highlight(field, query, searcher, topDocs)); } return m;
fields
- field names to highlight.
Must have a stored string value and also be indexed with offsets.query
- query to highlight.searcher
- searcher that was previously used to execute the query.topDocs
- TopDocs containing the summary result documents to highlight.
topDocs
.
If no highlights were found for a document, the
first sentence from the field will be returned.
IOException
- if an I/O error occurred during processing
IllegalArgumentException
- if field
was indexed without
FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
public Map<String,String[]> highlightFields(String[] fields, Query query, IndexSearcher searcher, TopDocs topDocs, int[] maxPassages) throws IOException
Conceptually, this behaves as a more efficient form of:
Map m = new HashMap(); for (String field : fields) { m.put(field, highlight(field, query, searcher, topDocs, maxPassages)); } return m;
fields
- field names to highlight.
Must have a stored string value and also be indexed with offsets.query
- query to highlight.searcher
- searcher that was previously used to execute the query.topDocs
- TopDocs containing the summary result documents to highlight.maxPassages
- The maximum number of top-N ranked passages per-field used to
form the highlighted snippets.
topDocs
.
If no highlights were found for a document, the
first maxPassages
sentences from the
field will be returned.
IOException
- if an I/O error occurred during processing
IllegalArgumentException
- if field
was indexed without
FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
public Map<String,String[]> highlightFields(String[] fieldsIn, Query query, IndexSearcher searcher, int[] docidsIn, int[] maxPassagesIn) throws IOException
fieldsIn
- field names to highlight.
Must have a stored string value and also be indexed with offsets.query
- query to highlight.searcher
- searcher that was previously used to execute the query.docidsIn
- containing the document IDs to highlight.maxPassagesIn
- The maximum number of top-N ranked passages per-field used to
form the highlighted snippets.
topDocs
.
If no highlights were found for a document, the
first maxPassages
from the field will
be returned.
IOException
- if an I/O error occurred during processing
IllegalArgumentException
- if field
was indexed without
FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
protected String[][] loadFieldValues(IndexSearcher searcher, String[] fields, int[] docids, int maxLength) throws IOException
IOException
protected char getMultiValuedSeparator(String field)
U+2029 PARAGRAPH SEPARATOR (PS)
if each value holds a discrete passage for highlighting.
protected Passage[] getEmptyHighlight(String fieldName, BreakIterator bi, int maxPassages)
maxPassages
sentences; subclasses can override
to customize.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |