Class MergePolicy

java.lang.Object
org.apache.lucene.index.MergePolicy
Direct Known Subclasses:
FilterMergePolicy, LogMergePolicy, NoMergePolicy, TieredMergePolicy

public abstract class MergePolicy extends Object
Expert: a MergePolicy determines the sequence of primitive merge operations.

Whenever the segments in an index have been altered by IndexWriter, either the addition of a newly flushed segment, addition of many segments from addIndexes* calls, or a previous merge that may now need to cascade, IndexWriter invokes findMerges(org.apache.lucene.index.MergeTrigger, org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.MergePolicy.MergeContext) to give the MergePolicy a chance to pick merges that are now required. This method returns a MergePolicy.MergeSpecification instance describing the set of merges that should be done, or null if no merges are necessary. When IndexWriter.forceMerge is called, it calls findForcedMerges(SegmentInfos, int, Map, MergeContext) and the MergePolicy should then return the necessary merges.

Note that the policy can return more than one merge at a time. In this case, if the writer is using SerialMergeScheduler, the merges will be run sequentially but if it is using ConcurrentMergeScheduler they will be run concurrently.

The default MergePolicy is TieredMergePolicy.

WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Field Details

    • DEFAULT_NO_CFS_RATIO

      protected static final double DEFAULT_NO_CFS_RATIO
      Default ratio for compound file system usage. Set to 1.0, always use compound file system.
      See Also:
    • DEFAULT_MAX_CFS_SEGMENT_SIZE

      protected static final long DEFAULT_MAX_CFS_SEGMENT_SIZE
      Default max segment size in order to use compound file system. Set to Long.MAX_VALUE.
      See Also:
    • noCFSRatio

      protected double noCFSRatio
      If the size of the merge segment exceeds this ratio of the total index size then it will remain in non-compound format
    • maxCFSSegmentSize

      protected long maxCFSSegmentSize
      If the size of the merged segment exceeds this value then it will not use compound file format.
  • Constructor Details

    • MergePolicy

      protected MergePolicy()
      Creates a new merge policy instance.
    • MergePolicy

      protected MergePolicy(double defaultNoCFSRatio, long defaultMaxCFSSegmentSize)
      Creates a new merge policy instance with default settings for noCFSRatio and maxCFSSegmentSize. This ctor should be used by subclasses using different defaults than the MergePolicy
  • Method Details

    • findMerges

      public abstract MergePolicy.MergeSpecification findMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException
      Determine what set of merge operations are now necessary on the index. IndexWriter calls this whenever there is a change to the segments. This call is always synchronized on the IndexWriter instance so only one thread at a time will call this method.
      Parameters:
      mergeTrigger - the event that triggered the merge
      segmentInfos - the total set of segments in the index
      mergeContext - the IndexWriter to find the merges on
      Throws:
      IOException
    • findForcedMerges

      public abstract MergePolicy.MergeSpecification findForcedMerges(SegmentInfos segmentInfos, int maxSegmentCount, Map<SegmentCommitInfo,Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) throws IOException
      Determine what set of merge operations is necessary in order to merge to <= the specified segment count. IndexWriter calls this when its IndexWriter.forceMerge(int) method is called. This call is always synchronized on the IndexWriter instance so only one thread at a time will call this method.
      Parameters:
      segmentInfos - the total set of segments in the index
      maxSegmentCount - requested maximum number of segments in the index
      segmentsToMerge - contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.
      mergeContext - the MergeContext to find the merges on
      Throws:
      IOException
    • findForcedDeletesMerges

      public abstract MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException
      Determine what set of merge operations is necessary in order to expunge all deletes from the index.
      Parameters:
      segmentInfos - the total set of segments in the index
      mergeContext - the MergeContext to find the merges on
      Throws:
      IOException
    • findFullFlushMerges

      public MergePolicy.MergeSpecification findFullFlushMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException
      Identifies merges that we want to execute (synchronously) on commit. By default, this will do no merging on commit. If you implement this method in your MergePolicy you must also set a non-zero timeout using IndexWriterConfig.setMaxFullFlushMergeWaitMillis(long).

      Any merges returned here will make IndexWriter.commit(), IndexWriter.prepareCommit() or IndexWriter.getReader(boolean, boolean) block until the merges complete or until LiveIndexWriterConfig.getMaxFullFlushMergeWaitMillis() has elapsed. This may be used to merge small segments that have just been flushed, reducing the number of segments in the point in time snapshot. If a merge does not complete in the allotted time, it will continue to execute, and eventually finish and apply to future point in time snapshot, but will not be reflected in the current one.

      If a MergePolicy.OneMerge in the returned MergePolicy.MergeSpecification includes a segment already included in a registered merge, then IndexWriter.commit() or IndexWriter.prepareCommit() will throw a IllegalStateException. Use MergePolicy.MergeContext.getMergingSegments() to determine which segments are currently registered to merge.

      Parameters:
      mergeTrigger - the event that triggered the merge (COMMIT or GET_READER).
      segmentInfos - the total set of segments in the index (while preparing the commit)
      mergeContext - the MergeContext to find the merges on, which should be used to determine which segments are already in a registered merge (see MergePolicy.MergeContext.getMergingSegments()).
      Throws:
      IOException
    • useCompoundFile

      public boolean useCompoundFile(SegmentInfos infos, SegmentCommitInfo mergedInfo, MergePolicy.MergeContext mergeContext) throws IOException
      Returns true if a new segment (regardless of its origin) should use the compound file format. The default implementation returns true iff the size of the given mergedInfo is less or equal to getMaxCFSSegmentSizeMB() and the size is less or equal to the TotalIndexSize * getNoCFSRatio() otherwise false.
      Throws:
      IOException
    • size

      protected long size(SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) throws IOException
      Return the byte size of the provided SegmentCommitInfo, pro-rated by percentage of non-deleted documents is set.
      Throws:
      IOException
    • assertDelCount

      protected final boolean assertDelCount(int delCount, SegmentCommitInfo info)
      Asserts that the delCount for this SegmentCommitInfo is valid
    • isMerged

      protected final boolean isMerged(SegmentInfos infos, SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) throws IOException
      Returns true if this single info is already fully merged (has no pending deletes, is in the same dir as the writer, and matches the current compound file setting
      Throws:
      IOException
    • getNoCFSRatio

      public double getNoCFSRatio()
      Returns current noCFSRatio.
      See Also:
    • setNoCFSRatio

      public void setNoCFSRatio(double noCFSRatio)
      If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled. Set to 1.0 to always use CFS regardless of merge size.
    • getMaxCFSSegmentSizeMB

      public double getMaxCFSSegmentSizeMB()
      Returns the largest size allowed for a compound file segment
    • setMaxCFSSegmentSizeMB

      public void setMaxCFSSegmentSizeMB(double v)
      If a merged segment will be more than this value, leave the segment as non-compound file even if compound file is enabled. Set this to Double.POSITIVE_INFINITY (default) and noCFSRatio to 1.0 to always use CFS regardless of merge size.
    • keepFullyDeletedSegment

      public boolean keepFullyDeletedSegment(IOSupplier<CodecReader> readerIOSupplier) throws IOException
      Returns true if the segment represented by the given CodecReader should be keep even if it's fully deleted. This is useful for testing of for instance if the merge policy implements retention policies for soft deletes.
      Throws:
      IOException
    • numDeletesToMerge

      public int numDeletesToMerge(SegmentCommitInfo info, int delCount, IOSupplier<CodecReader> readerSupplier) throws IOException
      Returns the number of deletes that a merge would claim on the given segment. This method will by default return the sum of the del count on disk and the pending delete count. Yet, subclasses that wrap merge readers might modify this to reflect deletes that are carried over to the target segment in the case of soft deletes.

      Soft deletes all deletes to survive across merges in order to control when the soft-deleted data is claimed.

      Parameters:
      info - the segment info that identifies the segment
      delCount - the number deleted documents for this segment
      readerSupplier - a supplier that allows to obtain a CodecReader for this segment
      Throws:
      IOException
      See Also:
    • segString

      protected final String segString(MergePolicy.MergeContext mergeContext, Iterable<SegmentCommitInfo> infos)
      Builds a String representation of the given SegmentCommitInfo instances
    • message

      protected final void message(String message, MergePolicy.MergeContext mergeContext)
      Print a debug message to MergePolicy.MergeContext's infoStream.
    • verbose

      protected final boolean verbose(MergePolicy.MergeContext mergeContext)
      Returns true if the info-stream is in verbose mode
      See Also: