org.apache.lucene.search.highlight.WeightedSpanTermExtractor

public class WeightedSpanTermExtractor extends Object

Class used to extract WeightedSpanTerms from a Query based on whether Terms from the Query are contained in a supplied TokenStream.

In order to support additional, by default unsupported queries, subclasses can override extract(Query, float, Map) for extracting wrapped or delegate queries and extractUnknownQuery(Query, Map) to process custom leaf queries:

 
    WeightedSpanTermExtractor extractor = new WeightedSpanTermExtractor() {
        protected void extract(Query query, float boost, Map<String, WeightedSpanTerm>terms) throws IOException {
          if (query instanceof QueryWrapper) {
            extract(((QueryWrapper)query).getQuery(), boost, terms);
          } else {
            super.extract(query, boost, terms);
          }
        }

        protected void extractUnknownQuery(Query query, Map<String, WeightedSpanTerm> terms) throws IOException {
          if (query instanceOf CustomTermQuery) {
            Term term = ((CustomTermQuery) query).getTerm();
            terms.put(term.field(), new WeightedSpanTerm(1, term.text()));
          }
        }
    };
 }

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

protected static class

WeightedSpanTermExtractor.PositionCheckingMap<K>

This class makes sure that if both position sensitive and insensitive versions of the same term are added, the position insensitive one wins.
Constructor Summary

Constructors

Constructor

Description

WeightedSpanTermExtractor()

WeightedSpanTermExtractor(String defaultField)
Method Summary

Modifier and Type

Method

Description

protected void

collectSpanQueryFields(SpanQuery spanQuery, Set<String> fieldNames)

protected void

extract(Query query, float boost, Map<String,WeightedSpanTerm> terms)

Fills a Map with WeightedSpanTerms using the terms from the supplied Query.

protected void

extractUnknownQuery(Query query, Map<String,WeightedSpanTerm> terms)

protected void

extractWeightedSpanTerms(Map<String,WeightedSpanTerm> terms, SpanQuery spanQuery, float boost)

Fills a Map with WeightedSpanTerms using the terms from the supplied SpanQuery.

protected void

extractWeightedTerms(Map<String,WeightedSpanTerm> terms, Query query, float boost)

Fills a Map with WeightedSpanTerms using the terms from the supplied Query.

protected boolean

fieldNameComparator(String fieldNameToCheck)

Necessary to implement matches for queries against defaultField

boolean

getExpandMultiTermQuery()

protected LeafReaderContext

getLeafContext()

TokenStream

getTokenStream()

Returns the tokenStream which may have been wrapped in a CachingTokenFilter.

Map<String,WeightedSpanTerm>

getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream)

Creates a Map of WeightedSpanTerms from the given Query and TokenStream.

Map<String,WeightedSpanTerm>

getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream, String fieldName)

Creates a Map of WeightedSpanTerms from the given Query and TokenStream.

Map<String,WeightedSpanTerm>

getWeightedSpanTermsWithScores(Query query, float boost, TokenStream tokenStream, String fieldName, IndexReader reader)

Creates a Map of WeightedSpanTerms from the given Query and TokenStream.

boolean

isCachedTokenStream()

protected boolean

isQueryUnsupported(Class<? extends Query> clazz)

boolean

isUsePayloads()

protected boolean

mustRewriteQuery(SpanQuery spanQuery)

void

setExpandMultiTermQuery(boolean expandMultiTermQuery)

protected final void

setMaxDocCharsToAnalyze(int maxDocCharsToAnalyze)

A threshold of number of characters to analyze.

void

setUsePayloads(boolean usePayloads)

void

setWrapIfNotCachingTokenFilter(boolean wrap)

By default, TokenStreams that are not of the type CachingTokenFilter are wrapped in a CachingTokenFilter to ensure an efficient reset - if you are already using a different caching TokenStream impl and you don't want it to be wrapped, set this to false.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- WeightedSpanTermExtractor
  
  public WeightedSpanTermExtractor()
- WeightedSpanTermExtractor
  
  public WeightedSpanTermExtractor(String defaultField)
Method Details
- extract
  
  protected void extract(Query query, float boost, Map<String,WeightedSpanTerm> terms) throws IOException
  
  Fills a Map with WeightedSpanTerms using the terms from the supplied Query.
  
  Parameters:
  
  query - Query to extract Terms from
  
  terms - Map to place created WeightedSpanTerms in
  
  Throws:
  
  IOException - If there is a low-level I/O error
- isQueryUnsupported
  
  protected boolean isQueryUnsupported(Class<? extends Query> clazz)
- extractUnknownQuery
  
  protected void extractUnknownQuery(Query query, Map<String,WeightedSpanTerm> terms) throws IOException
  
  Throws:
  
  IOException
- extractWeightedSpanTerms
  
  protected void extractWeightedSpanTerms(Map<String,WeightedSpanTerm> terms, SpanQuery spanQuery, float boost) throws IOException
  
  Fills a Map with WeightedSpanTerms using the terms from the supplied SpanQuery.
  
  Parameters:
  
  terms - Map to place created WeightedSpanTerms in
  
  spanQuery - SpanQuery to extract Terms from
  
  Throws:
  
  IOException - If there is a low-level I/O error
- extractWeightedTerms
  
  protected void extractWeightedTerms(Map<String,WeightedSpanTerm> terms, Query query, float boost) throws IOException
  
  Fills a Map with WeightedSpanTerms using the terms from the supplied Query.
  
  Parameters:
  
  terms - Map to place created WeightedSpanTerms in
  
  query - Query to extract Terms from
  
  Throws:
  
  IOException - If there is a low-level I/O error
- fieldNameComparator
  
  protected boolean fieldNameComparator(String fieldNameToCheck)
  
  Necessary to implement matches for queries against defaultField
- getLeafContext
  
  protected LeafReaderContext getLeafContext() throws IOException
  
  Throws:
  
  IOException
- getWeightedSpanTerms
  
  public Map<String,WeightedSpanTerm> getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream) throws IOException
  
  Creates a Map of WeightedSpanTerms from the given Query and TokenStream.
  
  Parameters:
  
  query - that caused hit
  
  tokenStream - of text to be highlighted
  
  Returns:
  
  Map containing WeightedSpanTerms
  
  Throws:
  
  IOException - If there is a low-level I/O error
- getWeightedSpanTerms
  
  public Map<String,WeightedSpanTerm> getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream, String fieldName) throws IOException
  
  Creates a Map of WeightedSpanTerms from the given Query and TokenStream.
  
  Parameters:
  
  query - that caused hit
  
  tokenStream - of text to be highlighted
  
  fieldName - restricts Term's used based on field name
  
  Returns:
  
  Map containing WeightedSpanTerms
  
  Throws:
  
  IOException - If there is a low-level I/O error
- getWeightedSpanTermsWithScores
  
  public Map<String,WeightedSpanTerm> getWeightedSpanTermsWithScores(Query query, float boost, TokenStream tokenStream, String fieldName, IndexReader reader) throws IOException
  
  Creates a Map of WeightedSpanTerms from the given Query and TokenStream. Uses a supplied IndexReader to properly weight terms (for gradient highlighting).
  
  Parameters:
  
  query - that caused hit
  
  tokenStream - of text to be highlighted
  
  fieldName - restricts Term's used based on field name
  
  reader - to use for scoring
  
  Returns:
  
  Map of WeightedSpanTerms with quasi tf/idf scores
  
  Throws:
  
  IOException - If there is a low-level I/O error
- collectSpanQueryFields
  
  protected void collectSpanQueryFields(SpanQuery spanQuery, Set<String> fieldNames)
- mustRewriteQuery
  
  protected boolean mustRewriteQuery(SpanQuery spanQuery)
- getExpandMultiTermQuery
  
  public boolean getExpandMultiTermQuery()
- setExpandMultiTermQuery
  
  public void setExpandMultiTermQuery(boolean expandMultiTermQuery)
- isUsePayloads
  
  public boolean isUsePayloads()
- setUsePayloads
  
  public void setUsePayloads(boolean usePayloads)
- isCachedTokenStream
  
  public boolean isCachedTokenStream()
- getTokenStream
  
  public TokenStream getTokenStream()
  
  Returns the tokenStream which may have been wrapped in a CachingTokenFilter. getWeightedSpanTerms* sets the tokenStream, so don't call this before.
- setWrapIfNotCachingTokenFilter
  
  public void setWrapIfNotCachingTokenFilter(boolean wrap)
  
  By default, TokenStreams that are not of the type CachingTokenFilter are wrapped in a CachingTokenFilter to ensure an efficient reset - if you are already using a different caching TokenStream impl and you don't want it to be wrapped, set this to false. This setting is ignored when a term vector based TokenStream is supplied, since it can be reset efficiently.
- setMaxDocCharsToAnalyze
  
  protected final void setMaxDocCharsToAnalyze(int maxDocCharsToAnalyze)
  
  A threshold of number of characters to analyze. When a TokenStream based on term vectors with offsets and positions are supplied, this setting does not apply.

Class WeightedSpanTermExtractor

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

WeightedSpanTermExtractor

WeightedSpanTermExtractor

Method Details

extract

isQueryUnsupported

extractUnknownQuery

extractWeightedSpanTerms

extractWeightedTerms

fieldNameComparator

getLeafContext

getWeightedSpanTerms

getWeightedSpanTerms

getWeightedSpanTermsWithScores

collectSpanQueryFields

mustRewriteQuery

getExpandMultiTermQuery

setExpandMultiTermQuery

isUsePayloads

setUsePayloads

isCachedTokenStream

getTokenStream

setWrapIfNotCachingTokenFilter

setMaxDocCharsToAnalyze