org.apache.lucene.index
Class LogMergePolicy

java.lang.Object
  extended by org.apache.lucene.index.MergePolicy
      extended by org.apache.lucene.index.LogMergePolicy
All Implemented Interfaces:
Closeable
Direct Known Subclasses:
LogByteSizeMergePolicy, LogDocMergePolicy

public abstract class LogMergePolicy
extends MergePolicy

This class implements a MergePolicy that tries to merge segments into levels of exponentially increasing size, where each level has fewer segments than the value of the merge factor. Whenever extra segments (beyond the merge factor upper bound) are encountered, all segments within the level are merged. You can get or set the merge factor using getMergeFactor() and setMergeFactor(int) respectively.

This class is abstract and requires a subclass to define the size(org.apache.lucene.index.SegmentInfo) method which specifies how a segment's size is determined. LogDocMergePolicy is one subclass that measures size by document count in the segment. LogByteSizeMergePolicy is another subclass that measures size as the total byte size of the file(s) for the segment.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.index.MergePolicy
MergePolicy.MergeAbortedException, MergePolicy.MergeException, MergePolicy.MergeSpecification, MergePolicy.OneMerge
 
Field Summary
protected  boolean calibrateSizeByDeletes
           
static int DEFAULT_MAX_MERGE_DOCS
          Default maximum segment size.
static int DEFAULT_MERGE_FACTOR
          Default merge factor, which is how many segments are merged at a time
static double DEFAULT_NO_CFS_RATIO
          Default noCFSRatio.
static double LEVEL_LOG_SPAN
          Defines the allowed range of log(size) for each level.
protected  int maxMergeDocs
           
protected  long maxMergeSize
           
protected  long maxMergeSizeForOptimize
           
protected  int mergeFactor
           
protected  long minMergeSize
           
protected  double noCFSRatio
           
protected  boolean useCompoundFile
           
 
Fields inherited from class org.apache.lucene.index.MergePolicy
writer
 
Constructor Summary
LogMergePolicy()
           
 
Method Summary
 void close()
          Release all resources for the policy.
 MergePolicy.MergeSpecification findMerges(SegmentInfos infos)
          Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so.
 MergePolicy.MergeSpecification findMergesForOptimize(SegmentInfos infos, int maxNumSegments, Set<SegmentInfo> segmentsToOptimize)
          Returns the merges necessary to optimize the index.
 MergePolicy.MergeSpecification findMergesToExpungeDeletes(SegmentInfos segmentInfos)
          Finds merges necessary to expunge all deletes from the index.
 boolean getCalibrateSizeByDeletes()
          Returns true if the segment size should be calibrated by the number of deletes when choosing segments for merge.
 int getMaxMergeDocs()
          Returns the largest segment (measured by document count) that may be merged with other segments.
 int getMergeFactor()
          Returns the number of segments that are merged at once and also controls the total number of segments allowed to accumulate in the index.
 double getNoCFSRatio()
           
 boolean getUseCompoundFile()
          Returns true if newly flushed and newly merge segments are written in compound file format.
protected  boolean isOptimized(SegmentInfo info)
          Returns true if this single info is optimized (has no pending norms or deletes, is in the same dir as the writer, and matches the current compound file setting
protected  boolean isOptimized(SegmentInfos infos, int maxNumSegments, Set<SegmentInfo> segmentsToOptimize)
           
protected  void message(String message)
           
 void setCalibrateSizeByDeletes(boolean calibrateSizeByDeletes)
          Sets whether the segment size should be calibrated by the number of deletes when choosing segments for merge.
 void setMaxMergeDocs(int maxMergeDocs)
          Determines the largest segment (measured by document count) that may be merged with other segments.
 void setMergeFactor(int mergeFactor)
          Determines how often segment indices are merged by addDocument().
 void setNoCFSRatio(double noCFSRatio)
          If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled.
 void setUseCompoundFile(boolean useCompoundFile)
          Sets whether compound file format should be used for newly flushed and newly merged segments.
protected abstract  long size(SegmentInfo info)
           
protected  long sizeBytes(SegmentInfo info)
           
protected  long sizeDocs(SegmentInfo info)
           
 String toString()
           
 boolean useCompoundFile(SegmentInfos infos, SegmentInfo mergedInfo)
          Returns true if a new segment (regardless of its origin) should use the compound file format.
protected  boolean verbose()
           
 
Methods inherited from class org.apache.lucene.index.MergePolicy
setIndexWriter
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

LEVEL_LOG_SPAN

public static final double LEVEL_LOG_SPAN
Defines the allowed range of log(size) for each level. A level is computed by taking the max segment log size, minus LEVEL_LOG_SPAN, and finding all segments falling within that range.

See Also:
Constant Field Values

DEFAULT_MERGE_FACTOR

public static final int DEFAULT_MERGE_FACTOR
Default merge factor, which is how many segments are merged at a time

See Also:
Constant Field Values

DEFAULT_MAX_MERGE_DOCS

public static final int DEFAULT_MAX_MERGE_DOCS
Default maximum segment size. A segment of this size or larger will never be merged. @see setMaxMergeDocs

See Also:
Constant Field Values

DEFAULT_NO_CFS_RATIO

public static final double DEFAULT_NO_CFS_RATIO
Default noCFSRatio. If a merge's size is >= 10% of the index, then we disable compound file for it.

See Also:
setNoCFSRatio(double), Constant Field Values

mergeFactor

protected int mergeFactor

minMergeSize

protected long minMergeSize

maxMergeSize

protected long maxMergeSize

maxMergeSizeForOptimize

protected long maxMergeSizeForOptimize

maxMergeDocs

protected int maxMergeDocs

noCFSRatio

protected double noCFSRatio

calibrateSizeByDeletes

protected boolean calibrateSizeByDeletes

useCompoundFile

protected boolean useCompoundFile
Constructor Detail

LogMergePolicy

public LogMergePolicy()
Method Detail

verbose

protected boolean verbose()

getNoCFSRatio

public double getNoCFSRatio()
See Also:
setNoCFSRatio(double)

setNoCFSRatio

public void setNoCFSRatio(double noCFSRatio)
If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled. Set to 1.0 to always use CFS regardless of merge size.


message

protected void message(String message)

getMergeFactor

public int getMergeFactor()

Returns the number of segments that are merged at once and also controls the total number of segments allowed to accumulate in the index.


setMergeFactor

public void setMergeFactor(int mergeFactor)
Determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.


useCompoundFile

public boolean useCompoundFile(SegmentInfos infos,
                               SegmentInfo mergedInfo)
                        throws IOException
Description copied from class: MergePolicy
Returns true if a new segment (regardless of its origin) should use the compound file format.

Specified by:
useCompoundFile in class MergePolicy
Throws:
IOException

setUseCompoundFile

public void setUseCompoundFile(boolean useCompoundFile)
Sets whether compound file format should be used for newly flushed and newly merged segments.


getUseCompoundFile

public boolean getUseCompoundFile()
Returns true if newly flushed and newly merge segments are written in compound file format. @see #setUseCompoundFile


setCalibrateSizeByDeletes

public void setCalibrateSizeByDeletes(boolean calibrateSizeByDeletes)
Sets whether the segment size should be calibrated by the number of deletes when choosing segments for merge.


getCalibrateSizeByDeletes

public boolean getCalibrateSizeByDeletes()
Returns true if the segment size should be calibrated by the number of deletes when choosing segments for merge.


close

public void close()
Description copied from class: MergePolicy
Release all resources for the policy.

Specified by:
close in interface Closeable
Specified by:
close in class MergePolicy

size

protected abstract long size(SegmentInfo info)
                      throws IOException
Throws:
IOException

sizeDocs

protected long sizeDocs(SegmentInfo info)
                 throws IOException
Throws:
IOException

sizeBytes

protected long sizeBytes(SegmentInfo info)
                  throws IOException
Throws:
IOException

isOptimized

protected boolean isOptimized(SegmentInfos infos,
                              int maxNumSegments,
                              Set<SegmentInfo> segmentsToOptimize)
                       throws IOException
Throws:
IOException

isOptimized

protected boolean isOptimized(SegmentInfo info)
                       throws IOException
Returns true if this single info is optimized (has no pending norms or deletes, is in the same dir as the writer, and matches the current compound file setting

Throws:
IOException

findMergesForOptimize

public MergePolicy.MergeSpecification findMergesForOptimize(SegmentInfos infos,
                                                            int maxNumSegments,
                                                            Set<SegmentInfo> segmentsToOptimize)
                                                     throws IOException
Returns the merges necessary to optimize the index. This merge policy defines "optimized" to mean only the requested number of segments is left in the index, and respects the maxMergeSizeForOptimize setting. By default, and assuming maxNumSegments=1, only one segment will be left in the index, where that segment has no deletions pending nor separate norms, and it is in compound file format if the current useCompoundFile setting is true. This method returns multiple merges (mergeFactor at a time) so the MergeScheduler in use may make use of concurrency.

Specified by:
findMergesForOptimize in class MergePolicy
Parameters:
infos - the total set of segments in the index
maxNumSegments - requested maximum number of segments in the index (currently this is always 1)
segmentsToOptimize - contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos.
Throws:
IOException

findMergesToExpungeDeletes

public MergePolicy.MergeSpecification findMergesToExpungeDeletes(SegmentInfos segmentInfos)
                                                          throws CorruptIndexException,
                                                                 IOException
Finds merges necessary to expunge all deletes from the index. We simply merge adjacent segments that have deletes, up to mergeFactor at a time.

Specified by:
findMergesToExpungeDeletes in class MergePolicy
Parameters:
segmentInfos - the total set of segments in the index
Throws:
CorruptIndexException
IOException

findMerges

public MergePolicy.MergeSpecification findMerges(SegmentInfos infos)
                                          throws IOException
Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so. A merge is necessary when there are more than setMergeFactor(int) segments at a given level. When multiple levels have too many segments, this method will return multiple merges, allowing the MergeScheduler to use concurrency.

Specified by:
findMerges in class MergePolicy
Parameters:
infos - the total set of segments in the index
Throws:
IOException

setMaxMergeDocs

public void setMaxMergeDocs(int maxMergeDocs)

Determines the largest segment (measured by document count) that may be merged with other segments. Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.

The default value is Integer.MAX_VALUE.

The default merge policy (LogByteSizeMergePolicy) also allows you to set this limit by net size (in MB) of the segment, using LogByteSizeMergePolicy.setMaxMergeMB(double).


getMaxMergeDocs

public int getMaxMergeDocs()
Returns the largest segment (measured by document count) that may be merged with other segments.

See Also:
setMaxMergeDocs(int)

toString

public String toString()
Overrides:
toString in class Object


Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.