public class CompressingStoredFieldsFormat extends StoredFieldsFormat
StoredFieldsFormat
that is very similar to
Lucene40StoredFieldsFormat
but compresses documents in chunks in
order to improve the compression ratio.
For a chunk size of chunkSize bytes, this StoredFieldsFormat
does not support documents larger than (231 - chunkSize)
bytes. In case this is a problem, you should use another format, such as
Lucene40StoredFieldsFormat
.
For optimal performance, you should use a MergePolicy
that returns
segments that have the biggest byte size first.
Constructor and Description |
---|
CompressingStoredFieldsFormat(String formatName,
CompressionMode compressionMode,
int chunkSize)
Create a new
CompressingStoredFieldsFormat with an empty segment
suffix. |
CompressingStoredFieldsFormat(String formatName,
String segmentSuffix,
CompressionMode compressionMode,
int chunkSize)
Create a new
CompressingStoredFieldsFormat . |
Modifier and Type | Method and Description |
---|---|
StoredFieldsReader |
fieldsReader(Directory directory,
SegmentInfo si,
FieldInfos fn,
IOContext context)
Returns a
StoredFieldsReader to load stored
fields. |
StoredFieldsWriter |
fieldsWriter(Directory directory,
SegmentInfo si,
IOContext context)
Returns a
StoredFieldsWriter to write stored
fields. |
String |
toString() |
public CompressingStoredFieldsFormat(String formatName, CompressionMode compressionMode, int chunkSize)
CompressingStoredFieldsFormat
with an empty segment
suffix.public CompressingStoredFieldsFormat(String formatName, String segmentSuffix, CompressionMode compressionMode, int chunkSize)
CompressingStoredFieldsFormat
.
formatName
is the name of the format. This name will be used
in the file formats to perform
codec header checks
.
segmentSuffix
is the segment suffix. This suffix is added to
the result file name only if it's not the empty string.
The compressionMode
parameter allows you to choose between
compression algorithms that have various compression and decompression
speeds so that you can pick the one that best fits your indexing and
searching throughput. You should never instantiate two
CompressingStoredFieldsFormat
s that have the same name but
different CompressionMode
s.
chunkSize
is the minimum byte size of a chunk of documents.
A value of 1
can make sense if there is redundancy across
fields. In that case, both performance and compression ratio should be
better than with Lucene40StoredFieldsFormat
with compressed
fields.
Higher values of chunkSize
should improve the compression
ratio but will require more memory at indexing time and might make document
loading a little slower (depending on the size of your OS cache compared
to the size of your index).
formatName
- the name of the StoredFieldsFormat
compressionMode
- the CompressionMode
to usechunkSize
- the minimum number of bytes of a single chunk of stored documentsCompressionMode
public StoredFieldsReader fieldsReader(Directory directory, SegmentInfo si, FieldInfos fn, IOContext context) throws IOException
StoredFieldsFormat
StoredFieldsReader
to load stored
fields.fieldsReader
in class StoredFieldsFormat
IOException
public StoredFieldsWriter fieldsWriter(Directory directory, SegmentInfo si, IOContext context) throws IOException
StoredFieldsFormat
StoredFieldsWriter
to write stored
fields.fieldsWriter
in class StoredFieldsFormat
IOException
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.