public class CompressingStoredFieldsFormat extends StoredFieldsFormat
StoredFieldsFormat that is very similar to
Lucene40StoredFieldsFormat but compresses documents in chunks in
order to improve the compression ratio.
For a chunk size of chunkSize bytes, this StoredFieldsFormat
does not support documents larger than (231 - chunkSize)
bytes. In case this is a problem, you should use another format, such as
Lucene40StoredFieldsFormat.
For optimal performance, you should use a MergePolicy that returns
segments that have the biggest byte size first.
| Constructor and Description |
|---|
CompressingStoredFieldsFormat(String formatName,
CompressionMode compressionMode,
int chunkSize)
Create a new
CompressingStoredFieldsFormat with an empty segment
suffix. |
CompressingStoredFieldsFormat(String formatName,
String segmentSuffix,
CompressionMode compressionMode,
int chunkSize)
Create a new
CompressingStoredFieldsFormat. |
| Modifier and Type | Method and Description |
|---|---|
StoredFieldsReader |
fieldsReader(Directory directory,
SegmentInfo si,
FieldInfos fn,
IOContext context)
Returns a
StoredFieldsReader to load stored
fields. |
StoredFieldsWriter |
fieldsWriter(Directory directory,
SegmentInfo si,
IOContext context)
Returns a
StoredFieldsWriter to write stored
fields. |
String |
toString() |
public CompressingStoredFieldsFormat(String formatName, CompressionMode compressionMode, int chunkSize)
CompressingStoredFieldsFormat with an empty segment
suffix.public CompressingStoredFieldsFormat(String formatName, String segmentSuffix, CompressionMode compressionMode, int chunkSize)
CompressingStoredFieldsFormat.
formatName is the name of the format. This name will be used
in the file formats to perform
codec header checks.
segmentSuffix is the segment suffix. This suffix is added to
the result file name only if it's not the empty string.
The compressionMode parameter allows you to choose between
compression algorithms that have various compression and decompression
speeds so that you can pick the one that best fits your indexing and
searching throughput. You should never instantiate two
CompressingStoredFieldsFormats that have the same name but
different CompressionModes.
chunkSize is the minimum byte size of a chunk of documents.
A value of 1 can make sense if there is redundancy across
fields. In that case, both performance and compression ratio should be
better than with Lucene40StoredFieldsFormat with compressed
fields.
Higher values of chunkSize should improve the compression
ratio but will require more memory at indexing time and might make document
loading a little slower (depending on the size of your OS cache compared
to the size of your index).
formatName - the name of the StoredFieldsFormatcompressionMode - the CompressionMode to usechunkSize - the minimum number of bytes of a single chunk of stored documentsCompressionModepublic StoredFieldsReader fieldsReader(Directory directory, SegmentInfo si, FieldInfos fn, IOContext context) throws IOException
StoredFieldsFormatStoredFieldsReader to load stored
fields.fieldsReader in class StoredFieldsFormatIOExceptionpublic StoredFieldsWriter fieldsWriter(Directory directory, SegmentInfo si, IOContext context) throws IOException
StoredFieldsFormatStoredFieldsWriter to write stored
fields.fieldsWriter in class StoredFieldsFormatIOExceptionCopyright © 2000-2015 Apache Software Foundation. All Rights Reserved.