public abstract class LogMergePolicy extends MergePolicy
This class implements a MergePolicy
that tries
to merge segments into levels of exponentially
increasing size, where each level has fewer segments than
the value of the merge factor. Whenever extra segments
(beyond the merge factor upper bound) are encountered,
all segments within the level are merged. You can get or
set the merge factor using getMergeFactor()
and
setMergeFactor(int)
respectively.
This class is abstract and requires a subclass to
define the MergePolicy.size(org.apache.lucene.index.SegmentCommitInfo, org.apache.lucene.index.IndexWriter)
method which specifies how a
segment's size is determined. LogDocMergePolicy
is one subclass that measures size by document count in
the segment. LogByteSizeMergePolicy
is another
subclass that measures size as the total byte size of the
file(s) for the segment.
MergePolicy.DocMap, MergePolicy.MergeAbortedException, MergePolicy.MergeException, MergePolicy.MergeSpecification, MergePolicy.OneMerge
Modifier and Type | Field and Description |
---|---|
protected boolean |
calibrateSizeByDeletes
If true, we pro-rate a segment's size by the
percentage of non-deleted documents.
|
static int |
DEFAULT_MAX_MERGE_DOCS
Default maximum segment size.
|
static int |
DEFAULT_MERGE_FACTOR
Default merge factor, which is how many segments are
merged at a time
|
static double |
DEFAULT_NO_CFS_RATIO
Default noCFSRatio.
|
static double |
LEVEL_LOG_SPAN
Defines the allowed range of log(size) for each
level.
|
protected int |
maxMergeDocs
If a segment has more than this many documents then it
will never be merged.
|
protected long |
maxMergeSize
If the size of a segment exceeds this value then it
will never be merged.
|
protected long |
maxMergeSizeForForcedMerge
If the size of a segment exceeds this value then it
will never be merged during
IndexWriter.forceMerge(int) . |
protected int |
mergeFactor
How many segments to merge at a time.
|
protected long |
minMergeSize
Any segments whose size is smaller than this value
will be rounded up to this value.
|
DEFAULT_MAX_CFS_SEGMENT_SIZE, maxCFSSegmentSize, noCFSRatio
Constructor and Description |
---|
LogMergePolicy()
Sole constructor.
|
Modifier and Type | Method and Description |
---|---|
MergePolicy.MergeSpecification |
findForcedDeletesMerges(SegmentInfos segmentInfos,
IndexWriter writer)
Finds merges necessary to force-merge all deletes from the
index.
|
MergePolicy.MergeSpecification |
findForcedMerges(SegmentInfos infos,
int maxNumSegments,
Map<SegmentCommitInfo,Boolean> segmentsToMerge,
IndexWriter writer)
Returns the merges necessary to merge the index down
to a specified number of segments.
|
MergePolicy.MergeSpecification |
findMerges(MergeTrigger mergeTrigger,
SegmentInfos infos,
IndexWriter writer)
Checks if any merges are now necessary and returns a
MergePolicy.MergeSpecification if so. |
boolean |
getCalibrateSizeByDeletes()
Returns true if the segment size should be calibrated
by the number of deletes when choosing segments for merge.
|
int |
getMaxMergeDocs()
Returns the largest segment (measured by document
count) that may be merged with other segments.
|
int |
getMergeFactor()
Returns the number of segments that are merged at
once and also controls the total number of segments
allowed to accumulate in the index.
|
protected boolean |
isMerged(SegmentInfos infos,
int maxNumSegments,
Map<SegmentCommitInfo,Boolean> segmentsToMerge,
IndexWriter writer)
Returns true if the number of segments eligible for
merging is less than or equal to the specified
maxNumSegments . |
protected void |
message(String message,
IndexWriter writer)
Print a debug message to
IndexWriter 's infoStream . |
void |
setCalibrateSizeByDeletes(boolean calibrateSizeByDeletes)
Sets whether the segment size should be calibrated by
the number of deletes when choosing segments for merge.
|
void |
setMaxMergeDocs(int maxMergeDocs)
Determines the largest segment (measured by
document count) that may be merged with other segments.
|
void |
setMergeFactor(int mergeFactor)
Determines how often segment indices are merged by
addDocument().
|
protected long |
sizeBytes(SegmentCommitInfo info,
IndexWriter writer)
Return the byte size of the provided
SegmentCommitInfo , pro-rated by percentage of
non-deleted documents if setCalibrateSizeByDeletes(boolean) is set. |
protected long |
sizeDocs(SegmentCommitInfo info,
IndexWriter writer)
Return the number of documents in the provided
SegmentCommitInfo , pro-rated by percentage of
non-deleted documents if setCalibrateSizeByDeletes(boolean) is set. |
String |
toString() |
protected boolean |
verbose(IndexWriter writer)
|
getMaxCFSSegmentSizeMB, getNoCFSRatio, isMerged, setMaxCFSSegmentSizeMB, setNoCFSRatio, size, useCompoundFile
public static final double LEVEL_LOG_SPAN
public static final int DEFAULT_MERGE_FACTOR
public static final int DEFAULT_MAX_MERGE_DOCS
public static final double DEFAULT_NO_CFS_RATIO
>= 10%
of
the index, then we disable compound file for it.protected int mergeFactor
protected long minMergeSize
protected long maxMergeSize
protected long maxMergeSizeForForcedMerge
IndexWriter.forceMerge(int)
.protected int maxMergeDocs
protected boolean calibrateSizeByDeletes
public LogMergePolicy()
protected boolean verbose(IndexWriter writer)
protected void message(String message, IndexWriter writer)
IndexWriter
's infoStream
.public int getMergeFactor()
Returns the number of segments that are merged at once and also controls the total number of segments allowed to accumulate in the index.
public void setMergeFactor(int mergeFactor)
> 10
) are best for batch
index creation, and smaller values (< 10
) for indices
that are interactively maintained.public void setCalibrateSizeByDeletes(boolean calibrateSizeByDeletes)
public boolean getCalibrateSizeByDeletes()
protected long sizeDocs(SegmentCommitInfo info, IndexWriter writer) throws IOException
SegmentCommitInfo
, pro-rated by percentage of
non-deleted documents if setCalibrateSizeByDeletes(boolean)
is set.IOException
protected long sizeBytes(SegmentCommitInfo info, IndexWriter writer) throws IOException
SegmentCommitInfo
, pro-rated by percentage of
non-deleted documents if setCalibrateSizeByDeletes(boolean)
is set.IOException
protected boolean isMerged(SegmentInfos infos, int maxNumSegments, Map<SegmentCommitInfo,Boolean> segmentsToMerge, IndexWriter writer) throws IOException
maxNumSegments
.IOException
public MergePolicy.MergeSpecification findForcedMerges(SegmentInfos infos, int maxNumSegments, Map<SegmentCommitInfo,Boolean> segmentsToMerge, IndexWriter writer) throws IOException
maxMergeSizeForForcedMerge
setting.
By default, and assuming maxNumSegments=1
, only
one segment will be left in the index, where that segment
has no deletions pending nor separate norms, and it is in
compound file format if the current useCompoundFile
setting is true. This method returns multiple merges
(mergeFactor at a time) so the MergeScheduler
in use may make use of concurrency.findForcedMerges
in class MergePolicy
infos
- the total set of segments in the indexmaxNumSegments
- requested maximum number of segments in the index (currently this
is always 1)segmentsToMerge
- contains the specific SegmentInfo instances that must be merged
away. This may be a subset of all
SegmentInfos. If the value is True for a
given SegmentInfo, that means this segment was
an original segment present in the
to-be-merged index; else, it was a segment
produced by a cascaded merge.writer
- the IndexWriter to find the merges onIOException
public MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos segmentInfos, IndexWriter writer) throws IOException
findForcedDeletesMerges
in class MergePolicy
segmentInfos
- the total set of segments in the indexwriter
- the IndexWriter to find the merges onIOException
public MergePolicy.MergeSpecification findMerges(MergeTrigger mergeTrigger, SegmentInfos infos, IndexWriter writer) throws IOException
MergePolicy.MergeSpecification
if so. A merge
is necessary when there are more than setMergeFactor(int)
segments at a given level. When
multiple levels have too many segments, this method
will return multiple merges, allowing the MergeScheduler
to use concurrency.findMerges
in class MergePolicy
mergeTrigger
- the event that triggered the mergeinfos
- the total set of segments in the indexwriter
- the IndexWriter to find the merges onIOException
public void setMaxMergeDocs(int maxMergeDocs)
Determines the largest segment (measured by document count) that may be merged with other segments. Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.
The default value is Integer.MAX_VALUE
.
The default merge policy (LogByteSizeMergePolicy
) also allows you to set this
limit by net size (in MB) of the segment, using LogByteSizeMergePolicy.setMaxMergeMB(double)
.
public int getMaxMergeDocs()
setMaxMergeDocs(int)
Copyright © 2000-2016 Apache Software Foundation. All Rights Reserved.