public class CompressingStoredFieldsFormat extends StoredFieldsFormat
StoredFieldsFormat that compresses documents in chunks in
order to improve the compression ratio.
For a chunk size of chunkSize bytes, this StoredFieldsFormat
does not support documents larger than (231 - chunkSize)
bytes.
For optimal performance, you should use a MergePolicy that returns
segments that have the biggest byte size first.
| Constructor and Description |
|---|
CompressingStoredFieldsFormat(String formatName,
CompressionMode compressionMode,
int chunkSize,
int maxDocsPerChunk,
int blockSize)
Create a new
CompressingStoredFieldsFormat with an empty segment
suffix. |
CompressingStoredFieldsFormat(String formatName,
String segmentSuffix,
CompressionMode compressionMode,
int chunkSize,
int maxDocsPerChunk,
int blockSize)
Create a new
CompressingStoredFieldsFormat. |
| Modifier and Type | Method and Description |
|---|---|
StoredFieldsReader |
fieldsReader(Directory directory,
SegmentInfo si,
FieldInfos fn,
IOContext context)
Returns a
StoredFieldsReader to load stored
fields. |
StoredFieldsWriter |
fieldsWriter(Directory directory,
SegmentInfo si,
IOContext context)
Returns a
StoredFieldsWriter to write stored
fields. |
String |
toString() |
public CompressingStoredFieldsFormat(String formatName, CompressionMode compressionMode, int chunkSize, int maxDocsPerChunk, int blockSize)
CompressingStoredFieldsFormat with an empty segment
suffix.public CompressingStoredFieldsFormat(String formatName, String segmentSuffix, CompressionMode compressionMode, int chunkSize, int maxDocsPerChunk, int blockSize)
CompressingStoredFieldsFormat.
formatName is the name of the format. This name will be used
in the file formats to perform
codec header checks.
segmentSuffix is the segment suffix. This suffix is added to
the result file name only if it's not the empty string.
The compressionMode parameter allows you to choose between
compression algorithms that have various compression and decompression
speeds so that you can pick the one that best fits your indexing and
searching throughput. You should never instantiate two
CompressingStoredFieldsFormats that have the same name but
different CompressionModes.
chunkSize is the minimum byte size of a chunk of documents.
A value of 1 can make sense if there is redundancy across
fields.
maxDocsPerChunk is an upperbound on how many docs may be stored
in a single chunk. This is to bound the cpu costs for highly compressible data.
Higher values of chunkSize should improve the compression
ratio but will require more memory at indexing time and might make document
loading a little slower (depending on the size of your OS cache compared
to the size of your index).
formatName - the name of the StoredFieldsFormatcompressionMode - the CompressionMode to usechunkSize - the minimum number of bytes of a single chunk of stored documentsmaxDocsPerChunk - the maximum number of documents in a single chunkblockSize - the number of chunks to store in an index blockCompressionModepublic StoredFieldsReader fieldsReader(Directory directory, SegmentInfo si, FieldInfos fn, IOContext context) throws IOException
StoredFieldsFormatStoredFieldsReader to load stored
fields.fieldsReader in class StoredFieldsFormatIOExceptionpublic StoredFieldsWriter fieldsWriter(Directory directory, SegmentInfo si, IOContext context) throws IOException
StoredFieldsFormatStoredFieldsWriter to write stored
fields.fieldsWriter in class StoredFieldsFormatIOExceptionCopyright © 2000-2019 Apache Software Foundation. All Rights Reserved.