Class MergePolicy
- java.lang.Object
-
- org.apache.lucene.index.MergePolicy
-
- Direct Known Subclasses:
FilterMergePolicy
,LogMergePolicy
,NoMergePolicy
,TieredMergePolicy
public abstract class MergePolicy extends Object
Expert: a MergePolicy determines the sequence of primitive merge operations.Whenever the segments in an index have been altered by
IndexWriter
, either the addition of a newly flushed segment, addition of many segments from addIndexes* calls, or a previous merge that may now need to cascade,IndexWriter
invokesfindMerges(org.apache.lucene.index.MergeTrigger, org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.MergePolicy.MergeContext)
to give the MergePolicy a chance to pick merges that are now required. This method returns aMergePolicy.MergeSpecification
instance describing the set of merges that should be done, or null if no merges are necessary. When IndexWriter.forceMerge is called, it callsfindForcedMerges(SegmentInfos, int, Map, MergeContext)
and the MergePolicy should then return the necessary merges.Note that the policy can return more than one merge at a time. In this case, if the writer is using
SerialMergeScheduler
, the merges will be run sequentially but if it is usingConcurrentMergeScheduler
they will be run concurrently.The default MergePolicy is
TieredMergePolicy
.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
MergePolicy.MergeAbortedException
Thrown when a merge was explicitly aborted becauseIndexWriter.abortMerges()
was called.static interface
MergePolicy.MergeContext
This interface represents the current context of the merge selection process.static class
MergePolicy.MergeException
Exception thrown if there are any problems while executing a merge.static class
MergePolicy.MergeSpecification
A MergeSpecification instance provides the information necessary to perform multiple merges.static class
MergePolicy.OneMerge
OneMerge provides the information necessary to perform an individual primitive merge operation, resulting in a single new segment.static class
MergePolicy.OneMergeProgress
Progress and state for an executing merge.
-
Field Summary
Fields Modifier and Type Field Description protected static long
DEFAULT_MAX_CFS_SEGMENT_SIZE
Default max segment size in order to use compound file system.protected static double
DEFAULT_NO_CFS_RATIO
Default ratio for compound file system usage.protected long
maxCFSSegmentSize
If the size of the merged segment exceeds this value then it will not use compound file format.protected double
noCFSRatio
If the size of the merge segment exceeds this ratio of the total index size then it will remain in non-compound format
-
Constructor Summary
Constructors Modifier Constructor Description protected
MergePolicy()
Creates a new merge policy instance.protected
MergePolicy(double defaultNoCFSRatio, long defaultMaxCFSSegmentSize)
Creates a new merge policy instance with default settings for noCFSRatio and maxCFSSegmentSize.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected boolean
assertDelCount(int delCount, SegmentCommitInfo info)
Asserts that the delCount for this SegmentCommitInfo is validabstract MergePolicy.MergeSpecification
findForcedDeletesMerges(SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext)
Determine what set of merge operations is necessary in order to expunge all deletes from the index.abstract MergePolicy.MergeSpecification
findForcedMerges(SegmentInfos segmentInfos, int maxSegmentCount, Map<SegmentCommitInfo,Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext)
Determine what set of merge operations is necessary in order to merge to<=
the specified segment count.MergePolicy.MergeSpecification
findFullFlushMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext)
Identifies merges that we want to execute (synchronously) on commit.abstract MergePolicy.MergeSpecification
findMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext)
Determine what set of merge operations are now necessary on the index.double
getMaxCFSSegmentSizeMB()
Returns the largest size allowed for a compound file segmentdouble
getNoCFSRatio()
Returns currentnoCFSRatio
.protected boolean
isMerged(SegmentInfos infos, SegmentCommitInfo info, MergePolicy.MergeContext mergeContext)
Returns true if this single info is already fully merged (has no pending deletes, is in the same dir as the writer, and matches the current compound file settingboolean
keepFullyDeletedSegment(IOSupplier<CodecReader> readerIOSupplier)
Returns true if the segment represented by the given CodecReader should be keep even if it's fully deleted.protected void
message(String message, MergePolicy.MergeContext mergeContext)
Print a debug message toMergePolicy.MergeContext
'sinfoStream
.int
numDeletesToMerge(SegmentCommitInfo info, int delCount, IOSupplier<CodecReader> readerSupplier)
Returns the number of deletes that a merge would claim on the given segment.protected String
segString(MergePolicy.MergeContext mergeContext, Iterable<SegmentCommitInfo> infos)
Builds a String representation of the given SegmentCommitInfo instancesvoid
setMaxCFSSegmentSizeMB(double v)
If a merged segment will be more than this value, leave the segment as non-compound file even if compound file is enabled.void
setNoCFSRatio(double noCFSRatio)
If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled.protected long
size(SegmentCommitInfo info, MergePolicy.MergeContext mergeContext)
Return the byte size of the providedSegmentCommitInfo
, pro-rated by percentage of non-deleted documents is set.boolean
useCompoundFile(SegmentInfos infos, SegmentCommitInfo mergedInfo, MergePolicy.MergeContext mergeContext)
Returns true if a new segment (regardless of its origin) should use the compound file format.protected boolean
verbose(MergePolicy.MergeContext mergeContext)
Returnstrue
if the info-stream is in verbose mode
-
-
-
Field Detail
-
DEFAULT_NO_CFS_RATIO
protected static final double DEFAULT_NO_CFS_RATIO
Default ratio for compound file system usage. Set to1.0
, always use compound file system.- See Also:
- Constant Field Values
-
DEFAULT_MAX_CFS_SEGMENT_SIZE
protected static final long DEFAULT_MAX_CFS_SEGMENT_SIZE
Default max segment size in order to use compound file system. Set toLong.MAX_VALUE
.- See Also:
- Constant Field Values
-
noCFSRatio
protected double noCFSRatio
If the size of the merge segment exceeds this ratio of the total index size then it will remain in non-compound format
-
maxCFSSegmentSize
protected long maxCFSSegmentSize
If the size of the merged segment exceeds this value then it will not use compound file format.
-
-
Constructor Detail
-
MergePolicy
protected MergePolicy()
Creates a new merge policy instance.
-
MergePolicy
protected MergePolicy(double defaultNoCFSRatio, long defaultMaxCFSSegmentSize)
Creates a new merge policy instance with default settings for noCFSRatio and maxCFSSegmentSize. This ctor should be used by subclasses using different defaults than theMergePolicy
-
-
Method Detail
-
findMerges
public abstract MergePolicy.MergeSpecification findMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException
Determine what set of merge operations are now necessary on the index.IndexWriter
calls this whenever there is a change to the segments. This call is always synchronized on theIndexWriter
instance so only one thread at a time will call this method.- Parameters:
mergeTrigger
- the event that triggered the mergesegmentInfos
- the total set of segments in the indexmergeContext
- the IndexWriter to find the merges on- Throws:
IOException
-
findForcedMerges
public abstract MergePolicy.MergeSpecification findForcedMerges(SegmentInfos segmentInfos, int maxSegmentCount, Map<SegmentCommitInfo,Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) throws IOException
Determine what set of merge operations is necessary in order to merge to<=
the specified segment count.IndexWriter
calls this when itsIndexWriter.forceMerge(int)
method is called. This call is always synchronized on theIndexWriter
instance so only one thread at a time will call this method.- Parameters:
segmentInfos
- the total set of segments in the indexmaxSegmentCount
- requested maximum number of segments in the indexsegmentsToMerge
- contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.mergeContext
- the MergeContext to find the merges on- Throws:
IOException
-
findForcedDeletesMerges
public abstract MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException
Determine what set of merge operations is necessary in order to expunge all deletes from the index.- Parameters:
segmentInfos
- the total set of segments in the indexmergeContext
- the MergeContext to find the merges on- Throws:
IOException
-
findFullFlushMerges
public MergePolicy.MergeSpecification findFullFlushMerges(MergeTrigger mergeTrigger, SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException
Identifies merges that we want to execute (synchronously) on commit. By default, this will do no merging on commit. If you implement this method in yourMergePolicy
you must also set a non-zero timeout usingIndexWriterConfig.setMaxFullFlushMergeWaitMillis(long)
.Any merges returned here will make
IndexWriter.commit()
,IndexWriter.prepareCommit()
orIndexWriter.getReader(boolean, boolean)
block until the merges complete or untilLiveIndexWriterConfig.getMaxFullFlushMergeWaitMillis()
has elapsed. This may be used to merge small segments that have just been flushed, reducing the number of segments in the point in time snapshot. If a merge does not complete in the allotted time, it will continue to execute, and eventually finish and apply to future point in time snapshot, but will not be reflected in the current one.If a
MergePolicy.OneMerge
in the returnedMergePolicy.MergeSpecification
includes a segment already included in a registered merge, thenIndexWriter.commit()
orIndexWriter.prepareCommit()
will throw aIllegalStateException
. UseMergePolicy.MergeContext.getMergingSegments()
to determine which segments are currently registered to merge.- Parameters:
mergeTrigger
- the event that triggered the merge (COMMIT or GET_READER).segmentInfos
- the total set of segments in the index (while preparing the commit)mergeContext
- the MergeContext to find the merges on, which should be used to determine which segments are already in a registered merge (seeMergePolicy.MergeContext.getMergingSegments()
).- Throws:
IOException
-
useCompoundFile
public boolean useCompoundFile(SegmentInfos infos, SegmentCommitInfo mergedInfo, MergePolicy.MergeContext mergeContext) throws IOException
Returns true if a new segment (regardless of its origin) should use the compound file format. The default implementation returnstrue
iff the size of the given mergedInfo is less or equal togetMaxCFSSegmentSizeMB()
and the size is less or equal to the TotalIndexSize *getNoCFSRatio()
otherwisefalse
.- Throws:
IOException
-
size
protected long size(SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) throws IOException
Return the byte size of the providedSegmentCommitInfo
, pro-rated by percentage of non-deleted documents is set.- Throws:
IOException
-
assertDelCount
protected final boolean assertDelCount(int delCount, SegmentCommitInfo info)
Asserts that the delCount for this SegmentCommitInfo is valid
-
isMerged
protected final boolean isMerged(SegmentInfos infos, SegmentCommitInfo info, MergePolicy.MergeContext mergeContext) throws IOException
Returns true if this single info is already fully merged (has no pending deletes, is in the same dir as the writer, and matches the current compound file setting- Throws:
IOException
-
getNoCFSRatio
public double getNoCFSRatio()
Returns currentnoCFSRatio
.- See Also:
setNoCFSRatio(double)
-
setNoCFSRatio
public void setNoCFSRatio(double noCFSRatio)
If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled. Set to 1.0 to always use CFS regardless of merge size.
-
getMaxCFSSegmentSizeMB
public double getMaxCFSSegmentSizeMB()
Returns the largest size allowed for a compound file segment
-
setMaxCFSSegmentSizeMB
public void setMaxCFSSegmentSizeMB(double v)
If a merged segment will be more than this value, leave the segment as non-compound file even if compound file is enabled. Set this to Double.POSITIVE_INFINITY (default) and noCFSRatio to 1.0 to always use CFS regardless of merge size.
-
keepFullyDeletedSegment
public boolean keepFullyDeletedSegment(IOSupplier<CodecReader> readerIOSupplier) throws IOException
Returns true if the segment represented by the given CodecReader should be keep even if it's fully deleted. This is useful for testing of for instance if the merge policy implements retention policies for soft deletes.- Throws:
IOException
-
numDeletesToMerge
public int numDeletesToMerge(SegmentCommitInfo info, int delCount, IOSupplier<CodecReader> readerSupplier) throws IOException
Returns the number of deletes that a merge would claim on the given segment. This method will by default return the sum of the del count on disk and the pending delete count. Yet, subclasses that wrap merge readers might modify this to reflect deletes that are carried over to the target segment in the case of soft deletes.Soft deletes all deletes to survive across merges in order to control when the soft-deleted data is claimed.
- Parameters:
info
- the segment info that identifies the segmentdelCount
- the number deleted documents for this segmentreaderSupplier
- a supplier that allows to obtain aCodecReader
for this segment- Throws:
IOException
- See Also:
IndexWriter.softUpdateDocument(Term, Iterable, Field...)
,IndexWriterConfig.setSoftDeletesField(String)
-
segString
protected final String segString(MergePolicy.MergeContext mergeContext, Iterable<SegmentCommitInfo> infos)
Builds a String representation of the given SegmentCommitInfo instances
-
message
protected final void message(String message, MergePolicy.MergeContext mergeContext)
Print a debug message toMergePolicy.MergeContext
'sinfoStream
.
-
verbose
protected final boolean verbose(MergePolicy.MergeContext mergeContext)
Returnstrue
if the info-stream is in verbose mode- See Also:
message(String, MergeContext)
-
-