org.apache.lucene.index
Class MultiPassIndexSplitter

java.lang.Object
  extended by org.apache.lucene.index.MultiPassIndexSplitter

public class MultiPassIndexSplitter
extends Object

This tool splits input index into multiple equal parts. The method employed here uses IndexWriter.addIndexes(IndexReader[]) where the input data comes from the input index with artificially applied deletes to the document id-s that fall outside the selected partition.

Note 1: Deletes are only applied to a buffered list of deleted docs and don't affect the source index - this tool works also with read-only indexes.

Note 2: the disadvantage of this tool is that source index needs to be read as many times as there are parts to be created, hence the name of this tool.

NOTE: this tool is unaware of documents added atomically via IndexWriter.addDocuments(java.util.Collection) or IndexWriter.updateDocuments(org.apache.lucene.index.Term, java.util.Collection), which means it can easily break up such document groups.


Nested Class Summary
static class MultiPassIndexSplitter.FakeDeleteIndexReader
          This class pretends that it can write deletions to the underlying index.
 
Constructor Summary
MultiPassIndexSplitter()
           
 
Method Summary
static void main(String[] args)
           
 void split(IndexReader input, Directory[] outputs, boolean seq)
          Deprecated. use split(Version, IndexReader, Directory[], boolean) instead. This method will be removed in Lucene 4.0.
 void split(Version version, IndexReader input, Directory[] outputs, boolean seq)
          Split source index into multiple parts.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MultiPassIndexSplitter

public MultiPassIndexSplitter()
Method Detail

split

@Deprecated
public void split(IndexReader input,
                             Directory[] outputs,
                             boolean seq)
           throws IOException
Deprecated. use split(Version, IndexReader, Directory[], boolean) instead. This method will be removed in Lucene 4.0.

Split source index into multiple parts.

Parameters:
input - source index, can be read-only, can have deletions, can have multiple segments (or multiple readers).
outputs - list of directories where the output parts will be stored.
seq - if true, then the source index will be split into equal increasing ranges of document id-s. If false, source document id-s will be assigned in a deterministic round-robin fashion to one of the output splits.
Throws:
IOException

split

public void split(Version version,
                  IndexReader input,
                  Directory[] outputs,
                  boolean seq)
           throws IOException
Split source index into multiple parts.

Parameters:
input - source index, can be read-only, can have deletions, can have multiple segments (or multiple readers).
outputs - list of directories where the output parts will be stored.
seq - if true, then the source index will be split into equal increasing ranges of document id-s. If false, source document id-s will be assigned in a deterministic round-robin fashion to one of the output splits.
Throws:
IOException

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.