Class LogMergePolicy

java.lang.Object
org.apache.lucene.index.MergePolicy
org.apache.lucene.index.LogMergePolicy
Direct Known Subclasses:
LogByteSizeMergePolicy, LogDocMergePolicy

public abstract class LogMergePolicy extends MergePolicy
This class implements a MergePolicy that tries to merge segments into levels of exponentially increasing size, where each level has fewer segments than the value of the merge factor. Whenever extra segments (beyond the merge factor upper bound) are encountered, all segments within the level are merged. You can get or set the merge factor using getMergeFactor() and setMergeFactor(int) respectively.

This class is abstract and requires a subclass to define the MergePolicy.size(org.apache.lucene.index.SegmentCommitInfo, org.apache.lucene.index.MergePolicy.MergeContext) method which specifies how a segment's size is determined. LogDocMergePolicy is one subclass that measures size by document count in the segment. LogByteSizeMergePolicy is another subclass that measures size as the total byte size of the file(s) for the segment.

  • Field Details

    • LEVEL_LOG_SPAN

      public static final double LEVEL_LOG_SPAN
      Defines the allowed range of log(size) for each level. A level is computed by taking the max segment log size, minus LEVEL_LOG_SPAN, and finding all segments falling within that range.
      See Also:
    • DEFAULT_MERGE_FACTOR

      public static final int DEFAULT_MERGE_FACTOR
      Default merge factor, which is how many segments are merged at a time
      See Also:
    • DEFAULT_MAX_MERGE_DOCS

      public static final int DEFAULT_MAX_MERGE_DOCS
      Default maximum segment size. A segment of this size or larger will never be merged. @see setMaxMergeDocs
      See Also:
    • DEFAULT_NO_CFS_RATIO

      public static final double DEFAULT_NO_CFS_RATIO
      Default noCFSRatio. If a merge's size is >= 10% of the index, then we disable compound file for it.
      See Also:
    • mergeFactor

      protected int mergeFactor
      How many segments to merge at a time.
    • minMergeSize

      protected long minMergeSize
      Any segments whose size is smaller than this value will be rounded up to this value. This ensures that tiny segments are aggressively merged.
    • maxMergeSize

      protected long maxMergeSize
      If the size of a segment exceeds this value then it will never be merged.
    • maxMergeSizeForForcedMerge

      protected long maxMergeSizeForForcedMerge
      If the size of a segment exceeds this value then it will never be merged during IndexWriter.forceMerge(int).
    • maxMergeDocs

      protected int maxMergeDocs
      If a segment has more than this many documents then it will never be merged.
    • calibrateSizeByDeletes

      protected boolean calibrateSizeByDeletes
      If true, we pro-rate a segment's size by the percentage of non-deleted documents.
  • Constructor Details

    • LogMergePolicy

      public LogMergePolicy()
      Sole constructor. (For invocation by subclass constructors, typically implicit.)
  • Method Details

    • getMergeFactor

      public int getMergeFactor()
      Returns the number of segments that are merged at once and also controls the total number of segments allowed to accumulate in the index.
    • setMergeFactor

      public void setMergeFactor(int mergeFactor)
      Determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches is slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values ( < 10) for indices that are interactively maintained.
    • setCalibrateSizeByDeletes

      public void setCalibrateSizeByDeletes(boolean calibrateSizeByDeletes)
      Sets whether the segment size should be calibrated by the number of deletes when choosing segments for merge.
    • getCalibrateSizeByDeletes

      public boolean getCalibrateSizeByDeletes()
      Returns true if the segment size should be calibrated by the number of deletes when choosing segments for merge.
    • sizeDocs

      protected long sizeDocs(SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) throws IOException
      Return the number of documents in the provided SegmentCommitInfo, pro-rated by percentage of non-deleted documents if setCalibrateSizeByDeletes(boolean) is set.
      Throws:
      IOException
    • sizeBytes

      protected long sizeBytes(SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) throws IOException
      Return the byte size of the provided SegmentCommitInfo, pro-rated by percentage of non-deleted documents if setCalibrateSizeByDeletes(boolean) is set.
      Throws:
      IOException
    • isMerged

      protected boolean isMerged(SegmentInfos infos, int maxNumSegments, Map<SegmentCommitInfo,Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) throws IOException
      Returns true if the number of segments eligible for merging is less than or equal to the specified maxNumSegments.
      Throws:
      IOException
    • findForcedMerges

      public MergePolicy.MergeSpecification findForcedMerges(SegmentInfos infos, int maxNumSegments, Map<SegmentCommitInfo,Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) throws IOException
      Returns the merges necessary to merge the index down to a specified number of segments. This respects the maxMergeSizeForForcedMerge setting. By default, and assuming maxNumSegments=1, only one segment will be left in the index, where that segment has no deletions pending nor separate norms, and it is in compound file format if the current useCompoundFile setting is true. This method returns multiple merges (mergeFactor at a time) so the MergeScheduler in use may make use of concurrency.
      Specified by:
      findForcedMerges in class MergePolicy
      Parameters:
      infos - the total set of segments in the index
      maxNumSegments - requested maximum number of segments in the index
      segmentsToMerge - contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.
      mergeContext - the MergeContext to find the merges on
      Throws:
      IOException
    • findForcedDeletesMerges

      public MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException
      Finds merges necessary to force-merge all deletes from the index. We simply merge adjacent segments that have deletes, up to mergeFactor at a time.
      Specified by:
      findForcedDeletesMerges in class MergePolicy
      Parameters:
      segmentInfos - the total set of segments in the index
      mergeContext - the MergeContext to find the merges on
      Throws:
      IOException
    • findMerges

      public MergePolicy.MergeSpecification findMerges(MergeTrigger mergeTrigger, SegmentInfos infos, MergePolicy.MergeContext mergeContext) throws IOException
      Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so. A merge is necessary when there are more than setMergeFactor(int) segments at a given level. When multiple levels have too many segments, this method will return multiple merges, allowing the MergeScheduler to use concurrency.
      Specified by:
      findMerges in class MergePolicy
      Parameters:
      mergeTrigger - the event that triggered the merge
      infos - the total set of segments in the index
      mergeContext - the IndexWriter to find the merges on
      Throws:
      IOException
    • setMaxMergeDocs

      public void setMaxMergeDocs(int maxMergeDocs)
      Determines the largest segment (measured by document count) that may be merged with other segments. Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.

      The default value is Integer.MAX_VALUE.

      The default merge policy (LogByteSizeMergePolicy) also allows you to set this limit by net size (in MB) of the segment, using LogByteSizeMergePolicy.setMaxMergeMB(double).

    • getMaxMergeDocs

      public int getMaxMergeDocs()
      Returns the largest segment (measured by document count) that may be merged with other segments.
      See Also:
    • toString

      public String toString()
      Overrides:
      toString in class Object