org.apache.lucene.index
Class LogMergePolicy

java.lang.Object
  extended by org.apache.lucene.index.MergePolicy
      extended by org.apache.lucene.index.LogMergePolicy
All Implemented Interfaces:
Closeable, Cloneable
Direct Known Subclasses:
LogByteSizeMergePolicy, LogDocMergePolicy

public abstract class LogMergePolicy
extends MergePolicy

This class implements a MergePolicy that tries to merge segments into levels of exponentially increasing size, where each level has fewer segments than the value of the merge factor. Whenever extra segments (beyond the merge factor upper bound) are encountered, all segments within the level are merged. You can get or set the merge factor using getMergeFactor() and setMergeFactor(int) respectively.

This class is abstract and requires a subclass to define the MergePolicy.size(org.apache.lucene.index.SegmentCommitInfo) method which specifies how a segment's size is determined. LogDocMergePolicy is one subclass that measures size by document count in the segment. LogByteSizeMergePolicy is another subclass that measures size as the total byte size of the file(s) for the segment.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.index.MergePolicy
MergePolicy.DocMap, MergePolicy.MergeAbortedException, MergePolicy.MergeException, MergePolicy.MergeSpecification, MergePolicy.MergeTrigger, MergePolicy.OneMerge
 
Field Summary
protected  boolean calibrateSizeByDeletes
          If true, we pro-rate a segment's size by the percentage of non-deleted documents.
static int DEFAULT_MAX_MERGE_DOCS
          Default maximum segment size.
static int DEFAULT_MERGE_FACTOR
          Default merge factor, which is how many segments are merged at a time
static double DEFAULT_NO_CFS_RATIO
          Default noCFSRatio.
static double LEVEL_LOG_SPAN
          Defines the allowed range of log(size) for each level.
protected  int maxMergeDocs
          If a segment has more than this many documents then it will never be merged.
protected  long maxMergeSize
          If the size of a segment exceeds this value then it will never be merged.
protected  long maxMergeSizeForForcedMerge
          If the size of a segment exceeds this value then it will never be merged during IndexWriter.forceMerge(int).
protected  int mergeFactor
          How many segments to merge at a time.
protected  long minMergeSize
          Any segments whose size is smaller than this value will be rounded up to this value.
 
Fields inherited from class org.apache.lucene.index.MergePolicy
DEFAULT_MAX_CFS_SEGMENT_SIZE, maxCFSSegmentSize, noCFSRatio, writer
 
Constructor Summary
LogMergePolicy()
          Sole constructor.
 
Method Summary
 void close()
          Release all resources for the policy.
 MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos segmentInfos)
          Finds merges necessary to force-merge all deletes from the index.
 MergePolicy.MergeSpecification findForcedMerges(SegmentInfos infos, int maxNumSegments, Map<SegmentCommitInfo,Boolean> segmentsToMerge)
          Returns the merges necessary to merge the index down to a specified number of segments.
 MergePolicy.MergeSpecification findMerges(MergePolicy.MergeTrigger mergeTrigger, SegmentInfos infos)
          Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so.
 boolean getCalibrateSizeByDeletes()
          Returns true if the segment size should be calibrated by the number of deletes when choosing segments for merge.
 int getMaxMergeDocs()
          Returns the largest segment (measured by document count) that may be merged with other segments.
 int getMergeFactor()
          Returns the number of segments that are merged at once and also controls the total number of segments allowed to accumulate in the index.
protected  boolean isMerged(SegmentInfos infos, int maxNumSegments, Map<SegmentCommitInfo,Boolean> segmentsToMerge)
          Returns true if the number of segments eligible for merging is less than or equal to the specified maxNumSegments.
protected  void message(String message)
          Print a debug message to IndexWriter's infoStream.
 void setCalibrateSizeByDeletes(boolean calibrateSizeByDeletes)
          Sets whether the segment size should be calibrated by the number of deletes when choosing segments for merge.
 void setMaxMergeDocs(int maxMergeDocs)
          Determines the largest segment (measured by document count) that may be merged with other segments.
 void setMergeFactor(int mergeFactor)
          Determines how often segment indices are merged by addDocument().
protected  long sizeBytes(SegmentCommitInfo info)
          Return the byte size of the provided SegmentCommitInfo, pro-rated by percentage of non-deleted documents if setCalibrateSizeByDeletes(boolean) is set.
protected  long sizeDocs(SegmentCommitInfo info)
          Return the number of documents in the provided SegmentCommitInfo, pro-rated by percentage of non-deleted documents if setCalibrateSizeByDeletes(boolean) is set.
 String toString()
           
protected  boolean verbose()
          Returns true if LMP is enabled in IndexWriter's infoStream.
 
Methods inherited from class org.apache.lucene.index.MergePolicy
clone, getMaxCFSSegmentSizeMB, getNoCFSRatio, isMerged, setIndexWriter, setMaxCFSSegmentSizeMB, setNoCFSRatio, size, useCompoundFile
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

LEVEL_LOG_SPAN

public static final double LEVEL_LOG_SPAN
Defines the allowed range of log(size) for each level. A level is computed by taking the max segment log size, minus LEVEL_LOG_SPAN, and finding all segments falling within that range.

See Also:
Constant Field Values

DEFAULT_MERGE_FACTOR

public static final int DEFAULT_MERGE_FACTOR
Default merge factor, which is how many segments are merged at a time

See Also:
Constant Field Values

DEFAULT_MAX_MERGE_DOCS

public static final int DEFAULT_MAX_MERGE_DOCS
Default maximum segment size. A segment of this size or larger will never be merged. @see setMaxMergeDocs

See Also:
Constant Field Values

DEFAULT_NO_CFS_RATIO

public static final double DEFAULT_NO_CFS_RATIO
Default noCFSRatio. If a merge's size is >= 10% of the index, then we disable compound file for it.

See Also:
MergePolicy.setNoCFSRatio(double), Constant Field Values

mergeFactor

protected int mergeFactor
How many segments to merge at a time.


minMergeSize

protected long minMergeSize
Any segments whose size is smaller than this value will be rounded up to this value. This ensures that tiny segments are aggressively merged.


maxMergeSize

protected long maxMergeSize
If the size of a segment exceeds this value then it will never be merged.


maxMergeSizeForForcedMerge

protected long maxMergeSizeForForcedMerge
If the size of a segment exceeds this value then it will never be merged during IndexWriter.forceMerge(int).


maxMergeDocs

protected int maxMergeDocs
If a segment has more than this many documents then it will never be merged.


calibrateSizeByDeletes

protected boolean calibrateSizeByDeletes
If true, we pro-rate a segment's size by the percentage of non-deleted documents.

Constructor Detail

LogMergePolicy

public LogMergePolicy()
Sole constructor. (For invocation by subclass constructors, typically implicit.)

Method Detail

verbose

protected boolean verbose()
Returns true if LMP is enabled in IndexWriter's infoStream.


message

protected void message(String message)
Print a debug message to IndexWriter's infoStream.


getMergeFactor

public int getMergeFactor()

Returns the number of segments that are merged at once and also controls the total number of segments allowed to accumulate in the index.


setMergeFactor

public void setMergeFactor(int mergeFactor)
Determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches is slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.


setCalibrateSizeByDeletes

public void setCalibrateSizeByDeletes(boolean calibrateSizeByDeletes)
Sets whether the segment size should be calibrated by the number of deletes when choosing segments for merge.


getCalibrateSizeByDeletes

public boolean getCalibrateSizeByDeletes()
Returns true if the segment size should be calibrated by the number of deletes when choosing segments for merge.


close

public void close()
Description copied from class: MergePolicy
Release all resources for the policy.

Specified by:
close in interface Closeable
Specified by:
close in class MergePolicy

sizeDocs

protected long sizeDocs(SegmentCommitInfo info)
                 throws IOException
Return the number of documents in the provided SegmentCommitInfo, pro-rated by percentage of non-deleted documents if setCalibrateSizeByDeletes(boolean) is set.

Throws:
IOException

sizeBytes

protected long sizeBytes(SegmentCommitInfo info)
                  throws IOException
Return the byte size of the provided SegmentCommitInfo, pro-rated by percentage of non-deleted documents if setCalibrateSizeByDeletes(boolean) is set.

Throws:
IOException

isMerged

protected boolean isMerged(SegmentInfos infos,
                           int maxNumSegments,
                           Map<SegmentCommitInfo,Boolean> segmentsToMerge)
                    throws IOException
Returns true if the number of segments eligible for merging is less than or equal to the specified maxNumSegments.

Throws:
IOException

findForcedMerges

public MergePolicy.MergeSpecification findForcedMerges(SegmentInfos infos,
                                                       int maxNumSegments,
                                                       Map<SegmentCommitInfo,Boolean> segmentsToMerge)
                                                throws IOException
Returns the merges necessary to merge the index down to a specified number of segments. This respects the maxMergeSizeForForcedMerge setting. By default, and assuming maxNumSegments=1, only one segment will be left in the index, where that segment has no deletions pending nor separate norms, and it is in compound file format if the current useCompoundFile setting is true. This method returns multiple merges (mergeFactor at a time) so the MergeScheduler in use may make use of concurrency.

Specified by:
findForcedMerges in class MergePolicy
Parameters:
infos - the total set of segments in the index
maxNumSegments - requested maximum number of segments in the index (currently this is always 1)
segmentsToMerge - contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.
Throws:
IOException

findForcedDeletesMerges

public MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos segmentInfos)
                                                       throws IOException
Finds merges necessary to force-merge all deletes from the index. We simply merge adjacent segments that have deletes, up to mergeFactor at a time.

Specified by:
findForcedDeletesMerges in class MergePolicy
Parameters:
segmentInfos - the total set of segments in the index
Throws:
IOException

findMerges

public MergePolicy.MergeSpecification findMerges(MergePolicy.MergeTrigger mergeTrigger,
                                                 SegmentInfos infos)
                                          throws IOException
Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so. A merge is necessary when there are more than setMergeFactor(int) segments at a given level. When multiple levels have too many segments, this method will return multiple merges, allowing the MergeScheduler to use concurrency.

Specified by:
findMerges in class MergePolicy
Parameters:
mergeTrigger - the event that triggered the merge
infos - the total set of segments in the index
Throws:
IOException

setMaxMergeDocs

public void setMaxMergeDocs(int maxMergeDocs)

Determines the largest segment (measured by document count) that may be merged with other segments. Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.

The default value is Integer.MAX_VALUE.

The default merge policy (LogByteSizeMergePolicy) also allows you to set this limit by net size (in MB) of the segment, using LogByteSizeMergePolicy.setMaxMergeMB(double).


getMaxMergeDocs

public int getMaxMergeDocs()
Returns the largest segment (measured by document count) that may be merged with other segments.

See Also:
setMaxMergeDocs(int)

toString

public String toString()
Overrides:
toString in class Object


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.