Lucene41StoredFieldsFormat (Lucene 4.8.0 API)

java.lang.Object
- org.apache.lucene.codecs.StoredFieldsFormat
- - org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat
  - - org.apache.lucene.codecs.lucene41.Lucene41StoredFieldsFormat

```
public final class Lucene41StoredFieldsFormat
extends CompressingStoredFieldsFormat
```
Lucene 4.1 stored fields format.
Principle

This StoredFieldsFormat compresses blocks of 16KB of documents in order to improve the compression ratio compared to document-level compression. It uses the LZ4 compression algorithm, which is fast to compress and very fast to decompress data. Although the compression method that is used focuses more on speed than on compression ratio, it should provide interesting compression ratios for redundant inputs (such as log files, HTML or plain text).

File formats

Stored fields are represented by two files:
1. A fields data file (extension .fdt). This file stores a compact representation of documents in compressed blocks of 16KB or more. When writing a segment, documents are appended to an in-memory byte[] buffer. When its size reaches 16KB or more, some metadata about the documents is flushed to disk, immediately followed by a compressed representation of the buffer using the LZ4 compression format.
  
  Here is a more detailed description of the field data file format:
  - FieldData (.fdt) --> <Header>, PackedIntsVersion, <Chunk>^ChunkCount
  - Header --> CodecHeader
  - PackedIntsVersion --> PackedInts.VERSION_CURRENT as a VInt
  - ChunkCount is not known in advance and is the number of chunks necessary to store all document of the segment
  - Chunk --> DocBase, ChunkDocs, DocFieldCounts, DocLengths, <CompressedDocs>
  - DocBase --> the ID of the first document of the chunk as a VInt
  - ChunkDocs --> the number of documents in the chunk as a VInt
  - DocFieldCounts --> the number of stored fields of every document in the chunk, encoded as followed:
    - if chunkDocs=1, the unique value is encoded as a VInt
    - else read a VInt (let's call it bitsRequired)
      
      if bitsRequired is 0 then all values are equal, and the common value is the following VInt
      
      else bitsRequired is the number of bits required to store any value, and values are stored in a packed array where every value is stored on exactly bitsRequired bits
  - DocLengths --> the lengths of all documents in the chunk, encoded with the same method as DocFieldCounts
  - CompressedDocs --> a compressed representation of <Docs> using the LZ4 compression format
  - Docs --> <Doc>^ChunkDocs
  - Doc --> <FieldNumAndType, Value>^{DocFieldCount}
  - FieldNumAndType --> a VLong, whose 3 last bits are Type and other bits are FieldNum
  - Type -->
    - 0: Value is String
    - 1: Value is BinaryValue
    - 2: Value is Int
    - 3: Value is Float
    - 4: Value is Long
    - 5: Value is Double
    - 6, 7: unused
  - FieldNum --> an ID of the field
  - Value --> String | BinaryValue | Int | Float | Long | Double depending on Type
  - BinaryValue --> ValueLength <Byte>^ValueLength
  Notes
  - If documents are larger than 16KB then chunks will likely contain only one document. However, documents can never spread across several chunks (all fields of a single document are in the same chunk).
  - When at least one document in a chunk is large enough so that the chunk is larger than 32KB, the chunk will actually be compressed in several LZ4 blocks of 16KB. This allows StoredFieldVisitors which are only interested in the first fields of a document to not have to decompress 10MB of data if the document is 10MB, but only 16KB.
  - Given that the original lengths are written in the metadata of the chunk, the decompressor can leverage this information to stop decoding as soon as enough data has been decompressed.
  - In case documents are incompressible, CompressedDocs will be less than 0.5% larger than Docs.
2. A fields index file (extension .fdx).
  - FieldsIndex (.fdx) --> <Header>, <ChunkIndex>
  - Header --> CodecHeader
  - ChunkIndex: See CompressingStoredFieldsIndexWriter
Known limitations

This StoredFieldsFormat does not support individual documents larger than (2³¹ - 2¹⁴) bytes. In case this is a problem, you should use another format, such as Lucene40StoredFieldsFormat.
WARNING: This API is experimental and might change in incompatible ways in the next release.

- Constructor Summary
  
  Constructors
  Constructor and Description
  
  Lucene41StoredFieldsFormat()
  Sole constructor.
- Method Summary
  - Methods inherited from class org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat
    fieldsReader, fieldsWriter, toString
  - Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructors
Constructor and Description
`Lucene41StoredFieldsFormat()` Sole constructor.

- Constructor Detail
  - Lucene41StoredFieldsFormat
```
public Lucene41StoredFieldsFormat()
```
    Sole constructor.

Class Lucene41StoredFieldsFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat

Methods inherited from class java.lang.Object

Constructor Detail

Lucene41StoredFieldsFormat