Package org.apache.lucene.index
Class MultiPassIndexSplitter
- java.lang.Object
-
- org.apache.lucene.index.MultiPassIndexSplitter
-
public class MultiPassIndexSplitter extends Object
This tool splits input index into multiple equal parts. The method employed here usesIndexWriter.addIndexes(CodecReader[])
where the input data comes from the input index with artificially applied deletes to the document id-s that fall outside the selected partition.Note 1: Deletes are only applied to a buffered list of deleted docs and don't affect the source index - this tool works also with read-only indexes.
Note 2: the disadvantage of this tool is that source index needs to be read as many times as there are parts to be created, hence the name of this tool.
NOTE: this tool is unaware of documents added atomically via
IndexWriter.addDocuments(java.lang.Iterable<? extends java.lang.Iterable<? extends org.apache.lucene.index.IndexableField>>)
orIndexWriter.updateDocuments(org.apache.lucene.index.Term, java.lang.Iterable<? extends java.lang.Iterable<? extends org.apache.lucene.index.IndexableField>>)
, which means it can easily break up such document groups.
-
-
Constructor Summary
Constructors Constructor Description MultiPassIndexSplitter()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static void
main(String[] args)
void
split(IndexReader in, Directory[] outputs, boolean seq)
Split source index into multiple parts.
-
-
-
Method Detail
-
split
public void split(IndexReader in, Directory[] outputs, boolean seq) throws IOException
Split source index into multiple parts.- Parameters:
in
- source index, can have deletions, can have multiple segments (or multiple readers).outputs
- list of directories where the output parts will be stored.seq
- if true, then the source index will be split into equal increasing ranges of document id-s. If false, source document id-s will be assigned in a deterministic round-robin fashion to one of the output splits.- Throws:
IOException
- If there is a low-level I/O error
-
-