Class Lucene90StoredFieldsFormat
- java.lang.Object
-
- org.apache.lucene.codecs.StoredFieldsFormat
-
- org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat
-
public class Lucene90StoredFieldsFormat extends StoredFieldsFormat
Lucene 9.0 stored fields format.Principle
This
StoredFieldsFormat
compresses blocks of documents in order to improve the compression ratio compared to document-level compression. It uses the LZ4 compression algorithm by default in 8KB blocks and shared dictionaries, which is fast to compress and very fast to decompress data. Although the default compression method that is used (BEST_SPEED
) focuses more on speed than on compression ratio, it should provide interesting compression ratios for redundant inputs (such as log files, HTML or plain text). For higher compression, you can choose (BEST_COMPRESSION
), which uses the DEFLATE algorithm with 48KB blocks and shared dictionaries for a better ratio at the expense of slower performance. These two options can be configured like this:// the default: for high performance indexWriterConfig.setCodec(new Lucene99Codec(Mode.BEST_SPEED)); // instead for higher performance (but slower): // indexWriterConfig.setCodec(new Lucene99Codec(Mode.BEST_COMPRESSION));
File formats
Stored fields are represented by three files:
-
A fields data file (extension
.fdt
). This file stores a compact representation of documents in compressed blocks of 8KB or more. When writing a segment, documents are appended to an in-memorybyte[]
buffer. When its size reaches 80KB or more, some metadata about the documents is flushed to disk, immediately followed by a compressed representation of the buffer using the LZ4 compression format.Notes
- When at least one document in a chunk is large enough so that the chunk is larger
than 80KB, the chunk will actually be compressed in several LZ4 blocks of 8KB. This
allows
StoredFieldVisitor
s which are only interested in the first fields of a document to not have to decompress 10MB of data if the document is 10MB, but only 8-16KB(may cross the block). - Given that the original lengths are written in the metadata of the chunk, the decompressor can leverage this information to stop decoding as soon as enough data has been decompressed.
- In case documents are incompressible, the overhead of the compression format is less than 0.5%.
- When at least one document in a chunk is large enough so that the chunk is larger
than 80KB, the chunk will actually be compressed in several LZ4 blocks of 8KB. This
allows
-
A fields index file (extension
.fdx
). This file stores twomonotonic arrays
, one for the first doc IDs of each block of compressed documents, and another one for the corresponding offsets on disk. At search time, the array containing doc IDs is binary-searched in order to find the block that contains the expected doc ID, and the associated offset on disk is retrieved from the second array. -
A fields meta file (extension
.fdm
). This file stores metadata about the monotonic arrays stored in the index file.
Known limitations
This
StoredFieldsFormat
does not support individual documents larger than (231 - 214
) bytes.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Lucene90StoredFieldsFormat.Mode
Configuration option for stored fields.
-
Field Summary
Fields Modifier and Type Field Description static CompressionMode
BEST_COMPRESSION_MODE
Compression mode forLucene90StoredFieldsFormat.Mode.BEST_COMPRESSION
static CompressionMode
BEST_SPEED_MODE
Compression mode forLucene90StoredFieldsFormat.Mode.BEST_SPEED
static String
MODE_KEY
Attribute key for compression mode.
-
Constructor Summary
Constructors Constructor Description Lucene90StoredFieldsFormat()
Stored fields format with default optionsLucene90StoredFieldsFormat(Lucene90StoredFieldsFormat.Mode mode)
Stored fields format with specified mode
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description StoredFieldsReader
fieldsReader(Directory directory, SegmentInfo si, FieldInfos fn, IOContext context)
Returns aStoredFieldsReader
to load stored fields.StoredFieldsWriter
fieldsWriter(Directory directory, SegmentInfo si, IOContext context)
Returns aStoredFieldsWriter
to write stored fields.
-
-
-
Field Detail
-
MODE_KEY
public static final String MODE_KEY
Attribute key for compression mode.
-
BEST_COMPRESSION_MODE
public static final CompressionMode BEST_COMPRESSION_MODE
Compression mode forLucene90StoredFieldsFormat.Mode.BEST_COMPRESSION
-
BEST_SPEED_MODE
public static final CompressionMode BEST_SPEED_MODE
Compression mode forLucene90StoredFieldsFormat.Mode.BEST_SPEED
-
-
Constructor Detail
-
Lucene90StoredFieldsFormat
public Lucene90StoredFieldsFormat()
Stored fields format with default options
-
Lucene90StoredFieldsFormat
public Lucene90StoredFieldsFormat(Lucene90StoredFieldsFormat.Mode mode)
Stored fields format with specified mode
-
-
Method Detail
-
fieldsReader
public StoredFieldsReader fieldsReader(Directory directory, SegmentInfo si, FieldInfos fn, IOContext context) throws IOException
Description copied from class:StoredFieldsFormat
Returns aStoredFieldsReader
to load stored fields.- Specified by:
fieldsReader
in classStoredFieldsFormat
- Throws:
IOException
-
fieldsWriter
public StoredFieldsWriter fieldsWriter(Directory directory, SegmentInfo si, IOContext context) throws IOException
Description copied from class:StoredFieldsFormat
Returns aStoredFieldsWriter
to write stored fields.- Specified by:
fieldsWriter
in classStoredFieldsFormat
- Throws:
IOException
-
-