Class WeightedSpanTermExtractor

java.lang.Object
org.apache.lucene.search.highlight.WeightedSpanTermExtractor

public class WeightedSpanTermExtractor extends Object
Class used to extract WeightedSpanTerms from a Query based on whether Terms from the Query are contained in a supplied TokenStream.

In order to support additional, by default unsupported queries, subclasses can override extract(Query, float, Map) for extracting wrapped or delegate queries and extractUnknownQuery(Query, Map) to process custom leaf queries:

 
    WeightedSpanTermExtractor extractor = new WeightedSpanTermExtractor() {
        protected void extract(Query query, float boost, Map<String, WeightedSpanTerm>terms) throws IOException {
          if (query instanceof QueryWrapper) {
            extract(((QueryWrapper)query).getQuery(), boost, terms);
          } else {
            super.extract(query, boost, terms);
          }
        }

        protected void extractUnknownQuery(Query query, Map<String, WeightedSpanTerm> terms) throws IOException {
          if (query instanceOf CustomTermQuery) {
            Term term = ((CustomTermQuery) query).getTerm();
            terms.put(term.field(), new WeightedSpanTerm(1, term.text()));
          }
        }
    };
 }
 
 
  • Constructor Details

    • WeightedSpanTermExtractor

      public WeightedSpanTermExtractor()
    • WeightedSpanTermExtractor

      public WeightedSpanTermExtractor(String defaultField)
  • Method Details

    • extract

      protected void extract(Query query, float boost, Map<String,WeightedSpanTerm> terms) throws IOException
      Fills a Map with WeightedSpanTerms using the terms from the supplied Query.
      Parameters:
      query - Query to extract Terms from
      terms - Map to place created WeightedSpanTerms in
      Throws:
      IOException - If there is a low-level I/O error
    • isQueryUnsupported

      protected boolean isQueryUnsupported(Class<? extends Query> clazz)
    • extractUnknownQuery

      protected void extractUnknownQuery(Query query, Map<String,WeightedSpanTerm> terms) throws IOException
      Throws:
      IOException
    • extractWeightedSpanTerms

      protected void extractWeightedSpanTerms(Map<String,WeightedSpanTerm> terms, SpanQuery spanQuery, float boost) throws IOException
      Fills a Map with WeightedSpanTerms using the terms from the supplied SpanQuery.
      Parameters:
      terms - Map to place created WeightedSpanTerms in
      spanQuery - SpanQuery to extract Terms from
      Throws:
      IOException - If there is a low-level I/O error
    • extractWeightedTerms

      protected void extractWeightedTerms(Map<String,WeightedSpanTerm> terms, Query query, float boost) throws IOException
      Fills a Map with WeightedSpanTerms using the terms from the supplied Query.
      Parameters:
      terms - Map to place created WeightedSpanTerms in
      query - Query to extract Terms from
      Throws:
      IOException - If there is a low-level I/O error
    • fieldNameComparator

      protected boolean fieldNameComparator(String fieldNameToCheck)
      Necessary to implement matches for queries against defaultField
    • getLeafContext

      protected LeafReaderContext getLeafContext() throws IOException
      Throws:
      IOException
    • getWeightedSpanTerms

      public Map<String,WeightedSpanTerm> getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream) throws IOException
      Creates a Map of WeightedSpanTerms from the given Query and TokenStream.
      Parameters:
      query - that caused hit
      tokenStream - of text to be highlighted
      Returns:
      Map containing WeightedSpanTerms
      Throws:
      IOException - If there is a low-level I/O error
    • getWeightedSpanTerms

      public Map<String,WeightedSpanTerm> getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream, String fieldName) throws IOException
      Creates a Map of WeightedSpanTerms from the given Query and TokenStream.
      Parameters:
      query - that caused hit
      tokenStream - of text to be highlighted
      fieldName - restricts Term's used based on field name
      Returns:
      Map containing WeightedSpanTerms
      Throws:
      IOException - If there is a low-level I/O error
    • getWeightedSpanTermsWithScores

      public Map<String,WeightedSpanTerm> getWeightedSpanTermsWithScores(Query query, float boost, TokenStream tokenStream, String fieldName, IndexReader reader) throws IOException
      Creates a Map of WeightedSpanTerms from the given Query and TokenStream. Uses a supplied IndexReader to properly weight terms (for gradient highlighting).
      Parameters:
      query - that caused hit
      tokenStream - of text to be highlighted
      fieldName - restricts Term's used based on field name
      reader - to use for scoring
      Returns:
      Map of WeightedSpanTerms with quasi tf/idf scores
      Throws:
      IOException - If there is a low-level I/O error
    • collectSpanQueryFields

      protected void collectSpanQueryFields(SpanQuery spanQuery, Set<String> fieldNames)
    • mustRewriteQuery

      protected boolean mustRewriteQuery(SpanQuery spanQuery)
    • getExpandMultiTermQuery

      public boolean getExpandMultiTermQuery()
    • setExpandMultiTermQuery

      public void setExpandMultiTermQuery(boolean expandMultiTermQuery)
    • isUsePayloads

      public boolean isUsePayloads()
    • setUsePayloads

      public void setUsePayloads(boolean usePayloads)
    • isCachedTokenStream

      public boolean isCachedTokenStream()
    • getTokenStream

      public TokenStream getTokenStream()
      Returns the tokenStream which may have been wrapped in a CachingTokenFilter. getWeightedSpanTerms* sets the tokenStream, so don't call this before.
    • setWrapIfNotCachingTokenFilter

      public void setWrapIfNotCachingTokenFilter(boolean wrap)
      By default, TokenStreams that are not of the type CachingTokenFilter are wrapped in a CachingTokenFilter to ensure an efficient reset - if you are already using a different caching TokenStream impl and you don't want it to be wrapped, set this to false. This setting is ignored when a term vector based TokenStream is supplied, since it can be reset efficiently.
    • setMaxDocCharsToAnalyze

      protected final void setMaxDocCharsToAnalyze(int maxDocCharsToAnalyze)
      A threshold of number of characters to analyze. When a TokenStream based on term vectors with offsets and positions are supplied, this setting does not apply.