Package org.apache.lucene.util
Class OfflineSorter
- java.lang.Object
-
- org.apache.lucene.util.OfflineSorter
-
public class OfflineSorter extends Object
On-disk sorting of byte arrays. Each byte array (entry) is a composed of the following fields:- (two bytes) length of the following byte array,
- exactly the above count of bytes for the sequence to be sorted.
- See Also:
sort(String)
- WARNING: This API is experimental and might change in incompatible ways in the next release.
- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
OfflineSorter.BufferSize
A bit more descriptive unit for constructors.static class
OfflineSorter.ByteSequencesReader
Utility class to read length-prefixed byte[] entries from an input.static class
OfflineSorter.ByteSequencesWriter
Utility class to emit length-prefixed byte[] entries to an output stream for sorting.class
OfflineSorter.SortInfo
Sort info (debugging mostly).
-
Field Summary
Fields Modifier and Type Field Description static long
ABSOLUTE_MIN_SORT_BUFFER_SIZE
Absolute minimum required buffer size for sorting.static Comparator<BytesRef>
DEFAULT_COMPARATOR
Default comparator: sorts in binary (codepoint) orderstatic long
GB
Convenience constant for gigabytesstatic int
MAX_TEMPFILES
Maximum number of temporary files before doing an intermediate merge.static long
MB
Convenience constant for megabytesstatic long
MIN_BUFFER_SIZE_MB
Minimum recommended buffer size for sorting.
-
Constructor Summary
Constructors Constructor Description OfflineSorter(Directory dir, String tempFileNamePrefix)
Defaults constructor.OfflineSorter(Directory dir, String tempFileNamePrefix, Comparator<BytesRef> comparator)
Defaults constructor with a custom comparator.OfflineSorter(Directory dir, String tempFileNamePrefix, Comparator<BytesRef> comparator, OfflineSorter.BufferSize ramBufferSize, int maxTempfiles, int valueLength, ExecutorService exec, int maxPartitionsInRAM)
All-details constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Comparator<BytesRef>
getComparator()
Returns the comparator in use to sort entriesDirectory
getDirectory()
Returns theDirectory
we use to create temp files.protected OfflineSorter.ByteSequencesReader
getReader(ChecksumIndexInput in, String name)
Subclasses can override to change how byte sequences are read from disk.String
getTempFileNamePrefix()
Returns the temp file name prefix passed toDirectory.createTempOutput(java.lang.String, java.lang.String, org.apache.lucene.store.IOContext)
to generate temporary files.protected OfflineSorter.ByteSequencesWriter
getWriter(IndexOutput out, long itemCount)
Subclasses can override to change how byte sequences are written to disk.String
sort(String inputFileName)
Sort input to a new temp file, returning its name.
-
-
-
Field Detail
-
MB
public static final long MB
Convenience constant for megabytes- See Also:
- Constant Field Values
-
GB
public static final long GB
Convenience constant for gigabytes- See Also:
- Constant Field Values
-
MIN_BUFFER_SIZE_MB
public static final long MIN_BUFFER_SIZE_MB
Minimum recommended buffer size for sorting.- See Also:
- Constant Field Values
-
ABSOLUTE_MIN_SORT_BUFFER_SIZE
public static final long ABSOLUTE_MIN_SORT_BUFFER_SIZE
Absolute minimum required buffer size for sorting.- See Also:
- Constant Field Values
-
MAX_TEMPFILES
public static final int MAX_TEMPFILES
Maximum number of temporary files before doing an intermediate merge.- See Also:
- Constant Field Values
-
DEFAULT_COMPARATOR
public static final Comparator<BytesRef> DEFAULT_COMPARATOR
Default comparator: sorts in binary (codepoint) order
-
-
Constructor Detail
-
OfflineSorter
public OfflineSorter(Directory dir, String tempFileNamePrefix) throws IOException
Defaults constructor.- Throws:
IOException
- See Also:
OfflineSorter.BufferSize.automatic()
-
OfflineSorter
public OfflineSorter(Directory dir, String tempFileNamePrefix, Comparator<BytesRef> comparator) throws IOException
Defaults constructor with a custom comparator.- Throws:
IOException
- See Also:
OfflineSorter.BufferSize.automatic()
-
OfflineSorter
public OfflineSorter(Directory dir, String tempFileNamePrefix, Comparator<BytesRef> comparator, OfflineSorter.BufferSize ramBufferSize, int maxTempfiles, int valueLength, ExecutorService exec, int maxPartitionsInRAM)
All-details constructor. IfvalueLength
is -1 (the default), the length of each value differs; otherwise, all values have the specified length. If you pass a non-nullExecutorService
then it will be used to run sorting operations that can be run concurrently, and maxPartitionsInRAM is the maximum concurrent in-memory partitions. Thus the maximum possible RAM used by this class while sorting ismaxPartitionsInRAM * ramBufferSize
.
-
-
Method Detail
-
getTempFileNamePrefix
public String getTempFileNamePrefix()
Returns the temp file name prefix passed toDirectory.createTempOutput(java.lang.String, java.lang.String, org.apache.lucene.store.IOContext)
to generate temporary files.
-
sort
public String sort(String inputFileName) throws IOException
Sort input to a new temp file, returning its name.- Throws:
IOException
-
getWriter
protected OfflineSorter.ByteSequencesWriter getWriter(IndexOutput out, long itemCount) throws IOException
Subclasses can override to change how byte sequences are written to disk.- Throws:
IOException
-
getReader
protected OfflineSorter.ByteSequencesReader getReader(ChecksumIndexInput in, String name) throws IOException
Subclasses can override to change how byte sequences are read from disk.- Throws:
IOException
-
getComparator
public Comparator<BytesRef> getComparator()
Returns the comparator in use to sort entries
-
-