Class TieredMergePolicy

java.lang.Object
org.apache.lucene.index.MergePolicy
org.apache.lucene.index.TieredMergePolicy

public class TieredMergePolicy extends MergePolicy
Merges segments of approximately equal size, subject to an allowed number of segments per tier. This is similar to LogByteSizeMergePolicy, except this merge policy is able to merge non-adjacent segment, and separates how many segments are merged at once (setMaxMergeAtOnce(int)) from how many segments are allowed per tier (setSegmentsPerTier(double)). This merge policy also does not over-merge (i.e. cascade merges).

For normal merging, this policy first computes a "budget" of how many segments are allowed to be in the index. If the index is over-budget, then the policy sorts segments by decreasing size (pro-rating by percent deletes), and then finds the least-cost merge. Merge cost is measured by a combination of the "skew" of the merge (size of largest segment divided by smallest segment), total merge size and percent deletes reclaimed, so that merges with lower skew, smaller size and those reclaiming more deletes, are favored.

If a merge will produce a segment that's larger than setMaxMergedSegmentMB(double), then the policy will merge fewer segments (down to 1 at once, if that one has deletions) to keep the segment size under budget.

NOTE: this policy freely merges non-adjacent segments; if this is a problem, use LogMergePolicy.

NOTE: This policy always merges by byte size of the segments, always pro-rates by percent deletes

NOTE Starting with Lucene 7.5, if you call IndexWriter.forceMerge(int) with this (default) merge policy, if setMaxMergedSegmentMB(double) is in conflict with maxNumSegments passed to IndexWriter.forceMerge(int) then maxNumSegments wins. For example, if your index has 50 1 GB segments, and you have setMaxMergedSegmentMB(double) at 1024 (1 GB), and you call forceMerge(10), the two settings are clearly in conflict. TieredMergePolicy will choose to break the setMaxMergedSegmentMB(double) constraint and try to merge down to at most ten segments, each up to 5 * 1.25 GB in size (since an extra 25% buffer increase in the expected segment size is targetted).

findForcedDeletesMerges should never produce segments greater than maxSegmentSize.

WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Field Details

  • Constructor Details

    • TieredMergePolicy

      public TieredMergePolicy()
      Sole constructor, setting all settings to their defaults.
  • Method Details

    • setMaxMergeAtOnce

      public TieredMergePolicy setMaxMergeAtOnce(int v)
      Maximum number of segments to be merged at a time during "normal" merging. Default is 10.
    • getMaxMergeAtOnce

      public int getMaxMergeAtOnce()
      Returns the current maxMergeAtOnce setting.
      See Also:
    • setMaxMergedSegmentMB

      public TieredMergePolicy setMaxMergedSegmentMB(double v)
      Maximum sized segment to produce during normal merging. This setting is approximate: the estimate of the merged segment size is made by summing sizes of to-be-merged segments (compensating for percent deleted docs). Default is 5 GB.
    • getMaxMergedSegmentMB

      public double getMaxMergedSegmentMB()
      Returns the current maxMergedSegmentMB setting.
      See Also:
    • setDeletesPctAllowed

      public TieredMergePolicy setDeletesPctAllowed(double v)
      Controls the maximum percentage of deleted documents that is tolerated in the index. Lower values make the index more space efficient at the expense of increased CPU and I/O activity. Values must be between 20 and 50. Default value is 33.
    • getDeletesPctAllowed

      public double getDeletesPctAllowed()
      Returns the current deletesPctAllowed setting.
      See Also:
    • setFloorSegmentMB

      public TieredMergePolicy setFloorSegmentMB(double v)
      Segments smaller than this are "rounded up" to this size, ie treated as equal (floor) size for merge selection. This is to prevent frequent flushing of tiny segments from allowing a long tail in the index. Default is 2 MB.
    • getFloorSegmentMB

      public double getFloorSegmentMB()
      Returns the current floorSegmentMB.
      See Also:
    • setForceMergeDeletesPctAllowed

      public TieredMergePolicy setForceMergeDeletesPctAllowed(double v)
      When forceMergeDeletes is called, we only merge away a segment if its delete percentage is over this threshold. Default is 10%.
    • getForceMergeDeletesPctAllowed

      public double getForceMergeDeletesPctAllowed()
      Returns the current forceMergeDeletesPctAllowed setting.
      See Also:
    • setSegmentsPerTier

      public TieredMergePolicy setSegmentsPerTier(double v)
      Sets the allowed number of segments per tier. Smaller values mean more merging but fewer segments.

      Default is 10.0.

    • getSegmentsPerTier

      public double getSegmentsPerTier()
      Returns the current segmentsPerTier setting.
      See Also:
    • findMerges

      public MergePolicy.MergeSpecification findMerges(MergeTrigger mergeTrigger, SegmentInfos infos, MergePolicy.MergeContext mergeContext) throws IOException
      Description copied from class: MergePolicy
      Determine what set of merge operations are now necessary on the index. IndexWriter calls this whenever there is a change to the segments. This call is always synchronized on the IndexWriter instance so only one thread at a time will call this method.
      Specified by:
      findMerges in class MergePolicy
      Parameters:
      mergeTrigger - the event that triggered the merge
      infos - the total set of segments in the index
      mergeContext - the IndexWriter to find the merges on
      Throws:
      IOException
    • score

      protected TieredMergePolicy.MergeScore score(List<SegmentCommitInfo> candidate, boolean hitTooLarge, Map<SegmentCommitInfo,org.apache.lucene.index.TieredMergePolicy.SegmentSizeAndDocs> segmentsSizes) throws IOException
      Expert: scores one merge; subclasses can override.
      Throws:
      IOException
    • findForcedMerges

      public MergePolicy.MergeSpecification findForcedMerges(SegmentInfos infos, int maxSegmentCount, Map<SegmentCommitInfo,Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) throws IOException
      Description copied from class: MergePolicy
      Determine what set of merge operations is necessary in order to merge to <= the specified segment count. IndexWriter calls this when its IndexWriter.forceMerge(int) method is called. This call is always synchronized on the IndexWriter instance so only one thread at a time will call this method.
      Specified by:
      findForcedMerges in class MergePolicy
      Parameters:
      infos - the total set of segments in the index
      maxSegmentCount - requested maximum number of segments in the index
      segmentsToMerge - contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.
      mergeContext - the MergeContext to find the merges on
      Throws:
      IOException
    • findForcedDeletesMerges

      public MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos infos, MergePolicy.MergeContext mergeContext) throws IOException
      Description copied from class: MergePolicy
      Determine what set of merge operations is necessary in order to expunge all deletes from the index.
      Specified by:
      findForcedDeletesMerges in class MergePolicy
      Parameters:
      infos - the total set of segments in the index
      mergeContext - the MergeContext to find the merges on
      Throws:
      IOException
    • toString

      public String toString()
      Overrides:
      toString in class Object