Class MultipassTermFilteredPresearcher

  • public class MultipassTermFilteredPresearcher
    extends TermFilteredPresearcher
    A TermFilteredPresearcher that indexes queries multiple times, with terms collected from different routes through a querytree. Each route will produce a set of terms that are *sufficient* to select the query, and are indexed into a separate, suffixed field.

    Incoming documents are then converted to a set of Disjunction queries over each suffixed field, and these queries are combined into a conjunction query, such that the document's set of terms must match a term from each route.

    This allows filtering out of documents that contain one half of a two-term phrase query, for example. The query "hello world" will be indexed twice, once under 'hello' and once under 'world'. A document containing the terms "hello there" would match the first field, but not the second, and so would not be selected for matching.

    The number of passes the presearcher makes is configurable. More passes will improve the selected/matched ratio, but will take longer to index and will use more RAM.

    A minimum weight can we set for terms to be chosen for the second and subsequent passes. This allows users to avoid indexing stopwords, for example.