WeightedSpanTermExtractor (Lucene 8.1.1 API)

java.lang.Object
- org.apache.lucene.search.highlight.WeightedSpanTermExtractor

public class WeightedSpanTermExtractor
extends Object

Class used to extract WeightedSpanTerms from a Query based on whether Terms from the Query are contained in a supplied TokenStream. In order to support additional, by default unsupported queries, subclasses can override extract(Query, float, Map) for extracting wrapped or delegate queries and extractUnknownQuery(Query, Map) to process custom leaf queries:

 
    WeightedSpanTermExtractor extractor = new WeightedSpanTermExtractor() {
        protected void extract(Query query, float boost, Map<String, WeightedSpanTerm>terms) throws IOException {
          if (query instanceof QueryWrapper) {
            extract(((QueryWrapper)query).getQuery(), boost, terms);
          } else {
            super.extract(query, boost, terms);
          }
        }

        protected void extractUnknownQuery(Query query, Map<String, WeightedSpanTerm> terms) throws IOException {
          if (query instanceOf CustomTermQuery) {
            Term term = ((CustomTermQuery) query).getTerm();
            terms.put(term.field(), new WeightedSpanTerm(1, term.text()));
          }
        }
    };
 }

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`protected static class`	`WeightedSpanTermExtractor.PositionCheckingMap<K>` This class makes sure that if both position sensitive and insensitive versions of the same term are added, the position insensitive one wins.

Constructor Summary

Constructors
Constructor and Description

WeightedSpanTermExtractor()

WeightedSpanTermExtractor(String defaultField)

Constructors
Constructor and Description
`WeightedSpanTermExtractor()`
`WeightedSpanTermExtractor(String defaultField)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected void`	`collectSpanQueryFields(SpanQuery spanQuery, Set<String> fieldNames)`
`protected void`	`extract(Query query, float boost, Map<String,WeightedSpanTerm> terms)` Fills a `Map` with `WeightedSpanTerm`s using the terms from the supplied `Query`.
`protected void`	`extractUnknownQuery(Query query, Map<String,WeightedSpanTerm> terms)`
`protected void`	`extractWeightedSpanTerms(Map<String,WeightedSpanTerm> terms, SpanQuery spanQuery, float boost)` Fills a `Map` with `WeightedSpanTerm`s using the terms from the supplied `SpanQuery`.
`protected void`	`extractWeightedTerms(Map<String,WeightedSpanTerm> terms, Query query, float boost)` Fills a `Map` with `WeightedSpanTerm`s using the terms from the supplied `Query`.
`protected boolean`	`fieldNameComparator(String fieldNameToCheck)` Necessary to implement matches for queries against `defaultField`
`boolean`	`getExpandMultiTermQuery()`
`protected LeafReaderContext`	`getLeafContext()`
`TokenStream`	`getTokenStream()` Returns the tokenStream which may have been wrapped in a CachingTokenFilter.
`Map<String,WeightedSpanTerm>`	`getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream)` Creates a Map of `WeightedSpanTerms` from the given `Query` and `TokenStream`.
`Map<String,WeightedSpanTerm>`	`getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream, String fieldName)` Creates a Map of `WeightedSpanTerms` from the given `Query` and `TokenStream`.
`Map<String,WeightedSpanTerm>`	`getWeightedSpanTermsWithScores(Query query, float boost, TokenStream tokenStream, String fieldName, IndexReader reader)` Creates a Map of `WeightedSpanTerms` from the given `Query` and `TokenStream`.
`boolean`	`isCachedTokenStream()`
`protected boolean`	`isQueryUnsupported(Class<? extends Query> clazz)`
`boolean`	`isUsePayloads()`
`protected boolean`	`mustRewriteQuery(SpanQuery spanQuery)`
`void`	`setExpandMultiTermQuery(boolean expandMultiTermQuery)`
`protected void`	`setMaxDocCharsToAnalyze(int maxDocCharsToAnalyze)` A threshold of number of characters to analyze.
`void`	`setUsePayloads(boolean usePayloads)`
`void`	`setWrapIfNotCachingTokenFilter(boolean wrap)` By default, `TokenStream`s that are not of the type `CachingTokenFilter` are wrapped in a `CachingTokenFilter` to ensure an efficient reset - if you are already using a different caching `TokenStream` impl and you don't want it to be wrapped, set this to false.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

WeightedSpanTermExtractor
```
public WeightedSpanTermExtractor()
```

WeightedSpanTermExtractor

public WeightedSpanTermExtractor(String defaultField)

Method Detail

extract
```
protected void extract(Query query,
                       float boost,
                       Map<String,WeightedSpanTerm> terms)
                throws IOException
```
Fills a Map with WeightedSpanTerms using the terms from the supplied Query.

Parameters:

query - Query to extract Terms from

terms - Map to place created WeightedSpanTerms in

Throws:

IOException - If there is a low-level I/O error

isQueryUnsupported

protected boolean isQueryUnsupported(Class<? extends Query> clazz)

extractUnknownQuery

protected void extractUnknownQuery(Query query,
                                   Map<String,WeightedSpanTerm> terms)
                            throws IOException

Throws:: IOException

extractWeightedSpanTerms

protected void extractWeightedSpanTerms(Map<String,WeightedSpanTerm> terms,
                                        SpanQuery spanQuery,
                                        float boost)
                                 throws IOException

Fills a Map with WeightedSpanTerms using the terms from the supplied SpanQuery.

Parameters:: terms - Map to place created WeightedSpanTerms in; spanQuery - SpanQuery to extract Terms from
Throws:: IOException - If there is a low-level I/O error

extractWeightedTerms

protected void extractWeightedTerms(Map<String,WeightedSpanTerm> terms,
                                    Query query,
                                    float boost)
                             throws IOException

Fills a Map with WeightedSpanTerms using the terms from the supplied Query.

Parameters:: terms - Map to place created WeightedSpanTerms in; query - Query to extract Terms from
Throws:: IOException - If there is a low-level I/O error

fieldNameComparator
```
protected boolean fieldNameComparator(String fieldNameToCheck)
```
Necessary to implement matches for queries against defaultField

getLeafContext

protected LeafReaderContext getLeafContext()
                                    throws IOException

Throws:: IOException

getWeightedSpanTerms

public Map<String,WeightedSpanTerm> getWeightedSpanTerms(Query query,
                                                         float boost,
                                                         TokenStream tokenStream)
                                                  throws IOException

Creates a Map of WeightedSpanTerms from the given Query and TokenStream.

Parameters:: query - that caused hit; tokenStream - of text to be highlighted
Returns:: Map containing WeightedSpanTerms
Throws:: IOException - If there is a low-level I/O error

getWeightedSpanTerms

public Map<String,WeightedSpanTerm> getWeightedSpanTerms(Query query,
                                                         float boost,
                                                         TokenStream tokenStream,
                                                         String fieldName)
                                                  throws IOException

Creates a Map of WeightedSpanTerms from the given Query and TokenStream.

Parameters:: query - that caused hit; tokenStream - of text to be highlighted; fieldName - restricts Term's used based on field name
Returns:: Map containing WeightedSpanTerms
Throws:: IOException - If there is a low-level I/O error

getWeightedSpanTermsWithScores

public Map<String,WeightedSpanTerm> getWeightedSpanTermsWithScores(Query query,
                                                                   float boost,
                                                                   TokenStream tokenStream,
                                                                   String fieldName,
                                                                   IndexReader reader)
                                                            throws IOException

Creates a Map of WeightedSpanTerms from the given Query and TokenStream. Uses a supplied IndexReader to properly weight terms (for gradient highlighting).

Parameters:: query - that caused hit; tokenStream - of text to be highlighted; fieldName - restricts Term's used based on field name; reader - to use for scoring
Returns:: Map of WeightedSpanTerms with quasi tf/idf scores
Throws:: IOException - If there is a low-level I/O error

collectSpanQueryFields

protected void collectSpanQueryFields(SpanQuery spanQuery,
                                      Set<String> fieldNames)

mustRewriteQuery

protected boolean mustRewriteQuery(SpanQuery spanQuery)

getExpandMultiTermQuery

public boolean getExpandMultiTermQuery()

setExpandMultiTermQuery

public void setExpandMultiTermQuery(boolean expandMultiTermQuery)

isUsePayloads
```
public boolean isUsePayloads()
```

setUsePayloads

public void setUsePayloads(boolean usePayloads)

isCachedTokenStream
```
public boolean isCachedTokenStream()
```

getTokenStream
```
public TokenStream getTokenStream()
```
Returns the tokenStream which may have been wrapped in a CachingTokenFilter. getWeightedSpanTerms* sets the tokenStream, so don't call this before.

setWrapIfNotCachingTokenFilter
```
public void setWrapIfNotCachingTokenFilter(boolean wrap)
```
By default, TokenStreams that are not of the type CachingTokenFilter are wrapped in a CachingTokenFilter to ensure an efficient reset - if you are already using a different caching TokenStream impl and you don't want it to be wrapped, set this to false. This setting is ignored when a term vector based TokenStream is supplied, since it can be reset efficiently.

setMaxDocCharsToAnalyze
```
protected final void setMaxDocCharsToAnalyze(int maxDocCharsToAnalyze)
```
A threshold of number of characters to analyze. When a TokenStream based on term vectors with offsets and positions are supplied, this setting does not apply.

Class WeightedSpanTermExtractor

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

WeightedSpanTermExtractor

WeightedSpanTermExtractor

Method Detail

extract

isQueryUnsupported

extractUnknownQuery

extractWeightedSpanTerms

extractWeightedTerms

fieldNameComparator

getLeafContext

getWeightedSpanTerms

getWeightedSpanTerms

getWeightedSpanTermsWithScores

collectSpanQueryFields

mustRewriteQuery

getExpandMultiTermQuery

setExpandMultiTermQuery

isUsePayloads

setUsePayloads

isCachedTokenStream

getTokenStream

setWrapIfNotCachingTokenFilter

setMaxDocCharsToAnalyze