public class MultiPassIndexSplitter extends Object
IndexWriter.addIndexes(IndexReader...)
where the input data
comes from the input index with artificially applied deletes to the document
id-s that fall outside the selected partition.
Note 1: Deletes are only applied to a buffered list of deleted docs and don't affect the source index - this tool works also with read-only indexes.
Note 2: the disadvantage of this tool is that source index needs to be read as many times as there are parts to be created, hence the name of this tool.
NOTE: this tool is unaware of documents added
atomically via IndexWriter.addDocuments(java.util.Collection<org.apache.lucene.document.Document>)
or IndexWriter.updateDocuments(org.apache.lucene.index.Term, java.util.Collection<org.apache.lucene.document.Document>)
, which means it can easily
break up such document groups.
Constructor and Description |
---|
MultiPassIndexSplitter() |
Modifier and Type | Method and Description |
---|---|
static void |
main(String[] args) |
void |
split(IndexReader input,
Directory[] outputs,
boolean seq)
Deprecated.
use
split(Version, IndexReader, Directory[], boolean) instead.
This method will be removed in Lucene 4.0. |
void |
split(Version version,
IndexReader input,
Directory[] outputs,
boolean seq)
Split source index into multiple parts.
|
@Deprecated public void split(IndexReader input, Directory[] outputs, boolean seq) throws IOException
split(Version, IndexReader, Directory[], boolean)
instead.
This method will be removed in Lucene 4.0.input
- source index, can be read-only, can have deletions, can have
multiple segments (or multiple readers).outputs
- list of directories where the output parts will be stored.seq
- if true, then the source index will be split into equal
increasing ranges of document id-s. If false, source document id-s will be
assigned in a deterministic round-robin fashion to one of the output splits.IOException
public void split(Version version, IndexReader input, Directory[] outputs, boolean seq) throws IOException
input
- source index, can be read-only, can have deletions, can have
multiple segments (or multiple readers).outputs
- list of directories where the output parts will be stored.seq
- if true, then the source index will be split into equal
increasing ranges of document id-s. If false, source document id-s will be
assigned in a deterministic round-robin fashion to one of the output splits.IOException