Class MergePolicy
- Direct Known Subclasses:
FilterMergePolicy
,LogMergePolicy
,NoMergePolicy
,TieredMergePolicy
Whenever the segments in an index have been altered by IndexWriter
, either the
addition of a newly flushed segment, addition of many segments from addIndexes* calls, or a
previous merge that may now need to cascade, IndexWriter
invokes findMerges(org.apache.lucene.index.MergeTrigger, org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.MergePolicy.MergeContext)
to
give the MergePolicy a chance to pick merges that are now required. This method returns a MergePolicy.MergeSpecification
instance describing the set of merges that should be done, or null if no
merges are necessary. When IndexWriter.forceMerge is called, it calls findForcedMerges(SegmentInfos, int, Map, MergeContext)
and the MergePolicy should then return
the necessary merges.
Note that the policy can return more than one merge at a time. In this case, if the writer is
using SerialMergeScheduler
, the merges will be run sequentially but if it is using ConcurrentMergeScheduler
they will be run concurrently.
The default MergePolicy is TieredMergePolicy
.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Thrown when a merge was explicitly aborted becauseIndexWriter.abortMerges()
was called.static interface
This interface represents the current context of the merge selection process.static class
Exception thrown if there are any problems while executing a merge.static class
A MergeSpecification instance provides the information necessary to perform multiple merges.static class
OneMerge provides the information necessary to perform an individual primitive merge operation, resulting in a single new segment.static class
Progress and state for an executing merge. -
Field Summary
Modifier and TypeFieldDescriptionprotected static final long
Default max segment size in order to use compound file system.protected static final double
Default ratio for compound file system usage.protected long
If the size of the merged segment exceeds this value then it will not use compound file format.protected double
If the size of the merge segment exceeds this ratio of the total index size then it will remain in non-compound format -
Constructor Summary
ModifierConstructorDescriptionprotected
Creates a new merge policy instance.protected
MergePolicy
(double defaultNoCFSRatio, long defaultMaxCFSSegmentSize) Creates a new merge policy instance with default settings for noCFSRatio and maxCFSSegmentSize. -
Method Summary
Modifier and TypeMethodDescriptionprotected final boolean
assertDelCount
(int delCount, SegmentCommitInfo info) Asserts that the delCount for this SegmentCommitInfo is validabstract MergePolicy.MergeSpecification
findForcedDeletesMerges
(SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) Determine what set of merge operations is necessary in order to expunge all deletes from the index.abstract MergePolicy.MergeSpecification
findForcedMerges
(SegmentInfos segmentInfos, int maxSegmentCount, Map<SegmentCommitInfo, Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) Determine what set of merge operations is necessary in order to merge to<=
the specified segment count.findFullFlushMerges
(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) Identifies merges that we want to execute (synchronously) on commit.abstract MergePolicy.MergeSpecification
findMerges
(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) Determine what set of merge operations are now necessary on the index.double
Returns the largest size allowed for a compound file segmentdouble
Returns currentnoCFSRatio
.protected final boolean
isMerged
(SegmentInfos infos, SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) Returns true if this single info is already fully merged (has no pending deletes, is in the same dir as the writer, and matches the current compound file settingboolean
keepFullyDeletedSegment
(IOSupplier<CodecReader> readerIOSupplier) Returns true if the segment represented by the given CodecReader should be keep even if it's fully deleted.protected final void
message
(String message, MergePolicy.MergeContext mergeContext) Print a debug message toMergePolicy.MergeContext
'sinfoStream
.int
numDeletesToMerge
(SegmentCommitInfo info, int delCount, IOSupplier<CodecReader> readerSupplier) Returns the number of deletes that a merge would claim on the given segment.protected final String
segString
(MergePolicy.MergeContext mergeContext, Iterable<SegmentCommitInfo> infos) Builds a String representation of the given SegmentCommitInfo instancesvoid
setMaxCFSSegmentSizeMB
(double v) If a merged segment will be more than this value, leave the segment as non-compound file even if compound file is enabled.void
setNoCFSRatio
(double noCFSRatio) If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled.protected long
size
(SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) Return the byte size of the providedSegmentCommitInfo
, pro-rated by percentage of non-deleted documents is set.boolean
useCompoundFile
(SegmentInfos infos, SegmentCommitInfo mergedInfo, MergePolicy.MergeContext mergeContext) Returns true if a new segment (regardless of its origin) should use the compound file format.protected final boolean
verbose
(MergePolicy.MergeContext mergeContext) Returnstrue
if the info-stream is in verbose mode
-
Field Details
-
DEFAULT_NO_CFS_RATIO
protected static final double DEFAULT_NO_CFS_RATIODefault ratio for compound file system usage. Set to1.0
, always use compound file system.- See Also:
-
DEFAULT_MAX_CFS_SEGMENT_SIZE
protected static final long DEFAULT_MAX_CFS_SEGMENT_SIZEDefault max segment size in order to use compound file system. Set toLong.MAX_VALUE
.- See Also:
-
noCFSRatio
protected double noCFSRatioIf the size of the merge segment exceeds this ratio of the total index size then it will remain in non-compound format -
maxCFSSegmentSize
protected long maxCFSSegmentSizeIf the size of the merged segment exceeds this value then it will not use compound file format.
-
-
Constructor Details
-
MergePolicy
protected MergePolicy()Creates a new merge policy instance. -
MergePolicy
protected MergePolicy(double defaultNoCFSRatio, long defaultMaxCFSSegmentSize) Creates a new merge policy instance with default settings for noCFSRatio and maxCFSSegmentSize. This ctor should be used by subclasses using different defaults than theMergePolicy
-
-
Method Details
-
findMerges
public abstract MergePolicy.MergeSpecification findMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException Determine what set of merge operations are now necessary on the index.IndexWriter
calls this whenever there is a change to the segments. This call is always synchronized on theIndexWriter
instance so only one thread at a time will call this method.- Parameters:
mergeTrigger
- the event that triggered the mergesegmentInfos
- the total set of segments in the indexmergeContext
- the IndexWriter to find the merges on- Throws:
IOException
-
findForcedMerges
public abstract MergePolicy.MergeSpecification findForcedMerges(SegmentInfos segmentInfos, int maxSegmentCount, Map<SegmentCommitInfo, Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) throws IOExceptionDetermine what set of merge operations is necessary in order to merge to<=
the specified segment count.IndexWriter
calls this when itsIndexWriter.forceMerge(int)
method is called. This call is always synchronized on theIndexWriter
instance so only one thread at a time will call this method.- Parameters:
segmentInfos
- the total set of segments in the indexmaxSegmentCount
- requested maximum number of segments in the indexsegmentsToMerge
- contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.mergeContext
- the MergeContext to find the merges on- Throws:
IOException
-
findForcedDeletesMerges
public abstract MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException Determine what set of merge operations is necessary in order to expunge all deletes from the index.- Parameters:
segmentInfos
- the total set of segments in the indexmergeContext
- the MergeContext to find the merges on- Throws:
IOException
-
findFullFlushMerges
public MergePolicy.MergeSpecification findFullFlushMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException Identifies merges that we want to execute (synchronously) on commit. By default, this will do no merging on commit. If you implement this method in yourMergePolicy
you must also set a non-zero timeout usingIndexWriterConfig.setMaxFullFlushMergeWaitMillis(long)
.Any merges returned here will make
IndexWriter.commit()
,IndexWriter.prepareCommit()
orIndexWriter.getReader(boolean, boolean)
block until the merges complete or untilLiveIndexWriterConfig.getMaxFullFlushMergeWaitMillis()
has elapsed. This may be used to merge small segments that have just been flushed, reducing the number of segments in the point in time snapshot. If a merge does not complete in the allotted time, it will continue to execute, and eventually finish and apply to future point in time snapshot, but will not be reflected in the current one.If a
MergePolicy.OneMerge
in the returnedMergePolicy.MergeSpecification
includes a segment already included in a registered merge, thenIndexWriter.commit()
orIndexWriter.prepareCommit()
will throw aIllegalStateException
. UseMergePolicy.MergeContext.getMergingSegments()
to determine which segments are currently registered to merge.- Parameters:
mergeTrigger
- the event that triggered the merge (COMMIT or GET_READER).segmentInfos
- the total set of segments in the index (while preparing the commit)mergeContext
- the MergeContext to find the merges on, which should be used to determine which segments are already in a registered merge (seeMergePolicy.MergeContext.getMergingSegments()
).- Throws:
IOException
-
useCompoundFile
public boolean useCompoundFile(SegmentInfos infos, SegmentCommitInfo mergedInfo, MergePolicy.MergeContext mergeContext) throws IOException Returns true if a new segment (regardless of its origin) should use the compound file format. The default implementation returnstrue
iff the size of the given mergedInfo is less or equal togetMaxCFSSegmentSizeMB()
and the size is less or equal to the TotalIndexSize *getNoCFSRatio()
otherwisefalse
.- Throws:
IOException
-
size
protected long size(SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) throws IOException Return the byte size of the providedSegmentCommitInfo
, pro-rated by percentage of non-deleted documents is set.- Throws:
IOException
-
assertDelCount
Asserts that the delCount for this SegmentCommitInfo is valid -
isMerged
protected final boolean isMerged(SegmentInfos infos, SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) throws IOException Returns true if this single info is already fully merged (has no pending deletes, is in the same dir as the writer, and matches the current compound file setting- Throws:
IOException
-
getNoCFSRatio
public double getNoCFSRatio()Returns currentnoCFSRatio
.- See Also:
-
setNoCFSRatio
public void setNoCFSRatio(double noCFSRatio) If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled. Set to 1.0 to always use CFS regardless of merge size. -
getMaxCFSSegmentSizeMB
public double getMaxCFSSegmentSizeMB()Returns the largest size allowed for a compound file segment -
setMaxCFSSegmentSizeMB
public void setMaxCFSSegmentSizeMB(double v) If a merged segment will be more than this value, leave the segment as non-compound file even if compound file is enabled. Set this to Double.POSITIVE_INFINITY (default) and noCFSRatio to 1.0 to always use CFS regardless of merge size. -
keepFullyDeletedSegment
Returns true if the segment represented by the given CodecReader should be keep even if it's fully deleted. This is useful for testing of for instance if the merge policy implements retention policies for soft deletes.- Throws:
IOException
-
numDeletesToMerge
public int numDeletesToMerge(SegmentCommitInfo info, int delCount, IOSupplier<CodecReader> readerSupplier) throws IOException Returns the number of deletes that a merge would claim on the given segment. This method will by default return the sum of the del count on disk and the pending delete count. Yet, subclasses that wrap merge readers might modify this to reflect deletes that are carried over to the target segment in the case of soft deletes.Soft deletes all deletes to survive across merges in order to control when the soft-deleted data is claimed.
- Parameters:
info
- the segment info that identifies the segmentdelCount
- the number deleted documents for this segmentreaderSupplier
- a supplier that allows to obtain aCodecReader
for this segment- Throws:
IOException
- See Also:
-
segString
protected final String segString(MergePolicy.MergeContext mergeContext, Iterable<SegmentCommitInfo> infos) Builds a String representation of the given SegmentCommitInfo instances -
message
Print a debug message toMergePolicy.MergeContext
'sinfoStream
. -
verbose
Returnstrue
if the info-stream is in verbose mode- See Also:
-