Class WeightedSpanTermExtractor


  • public class WeightedSpanTermExtractor
    extends Object
    Class used to extract WeightedSpanTerms from a Query based on whether Terms from the Query are contained in a supplied TokenStream. In order to support additional, by default unsupported queries, subclasses can override extract(Query, float, Map) for extracting wrapped or delegate queries and extractUnknownQuery(Query, Map) to process custom leaf queries:
     
        WeightedSpanTermExtractor extractor = new WeightedSpanTermExtractor() {
            protected void extract(Query query, float boost, Map<String, WeightedSpanTerm>terms) throws IOException {
              if (query instanceof QueryWrapper) {
                extract(((QueryWrapper)query).getQuery(), boost, terms);
              } else {
                super.extract(query, boost, terms);
              }
            }
    
            protected void extractUnknownQuery(Query query, Map<String, WeightedSpanTerm> terms) throws IOException {
              if (query instanceOf CustomTermQuery) {
                Term term = ((CustomTermQuery) query).getTerm();
                terms.put(term.field(), new WeightedSpanTerm(1, term.text()));
              }
            }
        };
     }
     
     
    • Constructor Detail

      • WeightedSpanTermExtractor

        public WeightedSpanTermExtractor()
      • WeightedSpanTermExtractor

        public WeightedSpanTermExtractor​(String defaultField)
    • Method Detail

      • extract

        protected void extract​(Query query,
                               float boost,
                               Map<String,​WeightedSpanTerm> terms)
                        throws IOException
        Fills a Map with WeightedSpanTerms using the terms from the supplied Query.
        Parameters:
        query - Query to extract Terms from
        terms - Map to place created WeightedSpanTerms in
        Throws:
        IOException - If there is a low-level I/O error
      • isQueryUnsupported

        protected boolean isQueryUnsupported​(Class<? extends Query> clazz)
      • extractWeightedSpanTerms

        protected void extractWeightedSpanTerms​(Map<String,​WeightedSpanTerm> terms,
                                                SpanQuery spanQuery,
                                                float boost)
                                         throws IOException
        Fills a Map with WeightedSpanTerms using the terms from the supplied SpanQuery.
        Parameters:
        terms - Map to place created WeightedSpanTerms in
        spanQuery - SpanQuery to extract Terms from
        Throws:
        IOException - If there is a low-level I/O error
      • extractWeightedTerms

        protected void extractWeightedTerms​(Map<String,​WeightedSpanTerm> terms,
                                            Query query,
                                            float boost)
                                     throws IOException
        Fills a Map with WeightedSpanTerms using the terms from the supplied Query.
        Parameters:
        terms - Map to place created WeightedSpanTerms in
        query - Query to extract Terms from
        Throws:
        IOException - If there is a low-level I/O error
      • fieldNameComparator

        protected boolean fieldNameComparator​(String fieldNameToCheck)
        Necessary to implement matches for queries against defaultField
      • getWeightedSpanTerms

        public Map<String,​WeightedSpanTerm> getWeightedSpanTerms​(Query query,
                                                                       float boost,
                                                                       TokenStream tokenStream)
                                                                throws IOException
        Creates a Map of WeightedSpanTerms from the given Query and TokenStream.

        Parameters:
        query - that caused hit
        tokenStream - of text to be highlighted
        Returns:
        Map containing WeightedSpanTerms
        Throws:
        IOException - If there is a low-level I/O error
      • getWeightedSpanTerms

        public Map<String,​WeightedSpanTerm> getWeightedSpanTerms​(Query query,
                                                                       float boost,
                                                                       TokenStream tokenStream,
                                                                       String fieldName)
                                                                throws IOException
        Creates a Map of WeightedSpanTerms from the given Query and TokenStream.

        Parameters:
        query - that caused hit
        tokenStream - of text to be highlighted
        fieldName - restricts Term's used based on field name
        Returns:
        Map containing WeightedSpanTerms
        Throws:
        IOException - If there is a low-level I/O error
      • getWeightedSpanTermsWithScores

        public Map<String,​WeightedSpanTerm> getWeightedSpanTermsWithScores​(Query query,
                                                                                 float boost,
                                                                                 TokenStream tokenStream,
                                                                                 String fieldName,
                                                                                 IndexReader reader)
                                                                          throws IOException
        Creates a Map of WeightedSpanTerms from the given Query and TokenStream. Uses a supplied IndexReader to properly weight terms (for gradient highlighting).

        Parameters:
        query - that caused hit
        tokenStream - of text to be highlighted
        fieldName - restricts Term's used based on field name
        reader - to use for scoring
        Returns:
        Map of WeightedSpanTerms with quasi tf/idf scores
        Throws:
        IOException - If there is a low-level I/O error
      • collectSpanQueryFields

        protected void collectSpanQueryFields​(SpanQuery spanQuery,
                                              Set<String> fieldNames)
      • mustRewriteQuery

        protected boolean mustRewriteQuery​(SpanQuery spanQuery)
      • getExpandMultiTermQuery

        public boolean getExpandMultiTermQuery()
      • setExpandMultiTermQuery

        public void setExpandMultiTermQuery​(boolean expandMultiTermQuery)
      • isUsePayloads

        public boolean isUsePayloads()
      • setUsePayloads

        public void setUsePayloads​(boolean usePayloads)
      • isCachedTokenStream

        public boolean isCachedTokenStream()
      • getTokenStream

        public TokenStream getTokenStream()
        Returns the tokenStream which may have been wrapped in a CachingTokenFilter. getWeightedSpanTerms* sets the tokenStream, so don't call this before.
      • setWrapIfNotCachingTokenFilter

        public void setWrapIfNotCachingTokenFilter​(boolean wrap)
        By default, TokenStreams that are not of the type CachingTokenFilter are wrapped in a CachingTokenFilter to ensure an efficient reset - if you are already using a different caching TokenStream impl and you don't want it to be wrapped, set this to false. This setting is ignored when a term vector based TokenStream is supplied, since it can be reset efficiently.
      • setMaxDocCharsToAnalyze

        protected final void setMaxDocCharsToAnalyze​(int maxDocCharsToAnalyze)
        A threshold of number of characters to analyze. When a TokenStream based on term vectors with offsets and positions are supplied, this setting does not apply.