Class IntersectBlockReader

All Implemented Interfaces:
Accountable, BytesRefIterator
Direct Known Subclasses:
STIntersectBlockReader

public class IntersectBlockReader extends BlockReader
The "intersect" TermsEnum response to UniformSplitTerms.intersect(CompiledAutomaton, BytesRef), intersecting the terms with an automaton.

By design of the UniformSplit block keys, it is less efficient than org.apache.lucene.backward_codecs.lucene40.blocktree.IntersectTermsEnum for FuzzyQuery (-37%). It is slightly slower for WildcardQuery (-5%) and slightly faster for PrefixQuery (+5%).

WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Field Details

    • NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD

      protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD
      Threshold that controls when to attempt to jump to a block away.

      This counter is 0 when entering a block. It is incremented each time a term is rejected by the automaton. When the counter is greater than or equal to this threshold, then we compute the next term accepted by the automaton, with IntersectBlockReader.AutomatonNextTermCalculator, and we jump to a block away if the next term accepted is greater than the immediate next term in the block.

      A low value, for example 1, improves the performance of automatons requiring many jumps, for example FuzzyQuery and most WildcardQuery. A higher value improves the performance of automatons with less or no jump, for example PrefixQuery. A threshold of 4 seems to be a good balance.

      See Also:
    • automaton

      protected final Automaton automaton
    • runAutomaton

      protected final ByteRunAutomaton runAutomaton
    • finite

      protected final boolean finite
    • commonSuffix

      protected final BytesRef commonSuffix
    • minTermLength

      protected final int minTermLength
    • nextStringCalculator

      protected final IntersectBlockReader.AutomatonNextTermCalculator nextStringCalculator
    • seekTerm

      protected BytesRef seekTerm
      Set this when our current mode is seeking to this term. Set to null after.
    • numMatchedBytes

      protected int numMatchedBytes
      Number of bytes accepted by the automaton when validating the current term.
    • states

      protected int[] states
      Automaton states reached when validating the current term, from 0 to numMatchedBytes - 1.
    • blockIteration

      protected IntersectBlockReader.BlockIteration blockIteration
      Block iteration order determined when scanning the terms in the current block.
    • numConsecutivelyRejectedTerms

      protected int numConsecutivelyRejectedTerms
      Counter of the number of consecutively rejected terms. Depending on NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD, this may trigger a jump to a block away.
  • Constructor Details

  • Method Details

    • getMinTermLength

      protected int getMinTermLength()
      Computes the minimal length of the terms accepted by the automaton. This speeds up the term scanning for automatons accepting a finite language.
    • next

      public BytesRef next() throws IOException
      Specified by:
      next in interface BytesRefIterator
      Overrides:
      next in class BlockReader
      Throws:
      IOException
    • seekFirstBlock

      protected boolean seekFirstBlock() throws IOException
      Throws:
      IOException
    • nextTermInBlockMatching

      protected BytesRef nextTermInBlockMatching() throws IOException
      Finds the next block line that matches (accepted by the automaton), or null when at end of block.
      Returns:
      The next term in the current block that is accepted by the automaton; or null if none.
      Throws:
      IOException
    • endsWithCommonSuffix

      protected boolean endsWithCommonSuffix(byte[] termBytes, int termLength)
      Indicates whether the given term ends with the automaton common suffix. This allows to quickly skip terms that the automaton would reject eventually.
    • nextBlock

      protected boolean nextBlock() throws IOException
      Opens the next block. Depending on the blockIteration order, it may be the very next block, or a block away that may contain seekTerm.
      Returns:
      true if the next block is opened; false if there is no blocks anymore and the iteration is over.
      Throws:
      IOException
    • seekExact

      public boolean seekExact(BytesRef text)
      Overrides:
      seekExact in class BlockReader
    • seekExact

      public void seekExact(long ord)
      Description copied from class: BlockReader
      Not supported.
      Overrides:
      seekExact in class BlockReader
    • seekExact

      public void seekExact(BytesRef term, TermState state)
      Description copied from class: BlockReader
      Positions this BlockReader without re-seeking the term dictionary.

      The block containing the term is not read by this method. It will be read lazily only if needed, for example if BlockReader.next() is called. Calling BlockReader.postings(org.apache.lucene.index.PostingsEnum, int) after this method does require the block to be read.

      Overrides:
      seekExact in class BlockReader
    • seekCeil

      public TermsEnum.SeekStatus seekCeil(BytesRef text)
      Overrides:
      seekCeil in class BlockReader