org.apache.lucene.codecs (Lucene 9.1.0 core API)

package org.apache.lucene.codecs

Codecs API: API for customization of the encoding and structure of the index.

The Codec API allows you to customise the way the following pieces of index information are stored:

Postings lists - see PostingsFormat
DocValues - see DocValuesFormat
Stored fields - see StoredFieldsFormat
Term vectors - see TermVectorsFormat
Points - see PointsFormat
FieldInfos - see FieldInfosFormat
SegmentInfo - see SegmentInfoFormat
Norms - see NormsFormat
Live documents - see LiveDocsFormat

For some concrete implementations beyond Lucene's official index format, see the Codecs module.

Codecs are identified by name through the Java Service Provider Interface. To create your own codec, extend Codec and pass the new codec's name to the super() constructor:

 public class MyCodec extends Codec {

     public MyCodec() {
         super("MyCodecName");
     }

     ...
 }

You will need to register the Codec class so that the ServiceLoader can find it, by including a META-INF/services/org.apache.lucene.codecs.Codec file on your classpath that contains the package-qualified name of your codec.

If you just want to customise the PostingsFormat, or use different postings formats for different fields, then you can register your custom postings format in the same way (in META-INF/services/org.apache.lucene.codecs.PostingsFormat), and then extend the default codec and override org.apache.lucene.codecs.luceneMN.LuceneMNCodec#getPostingsFormatForField(String) to return your custom postings format.

Similarly, if you just want to customise the DocValuesFormat per-field, have a look at LuceneMNCodec.getDocValuesFormatForField(String).

Related Packages

Package

Description

org.apache.lucene.codecs.compressing

Compressing helper classes.

org.apache.lucene.codecs.lucene90

Lucene 9.0 file format.

org.apache.lucene.codecs.lucene91

Lucene 9.1 file format.

org.apache.lucene.codecs.perfield

Postings format that can delegate to different formats per-field.
Classes

Class

Description

BlockTermState

Holds all state required for PostingsReaderBase to produce a PostingsEnum without re-seeking the terms dict.

Codec

Encodes/decodes an inverted index segment.

CodecUtil

Utility class for reading and writing versioned headers.

CompetitiveImpactAccumulator

This class accumulates the (freq, norm) pairs that may produce competitive scores.

CompoundDirectory

A read-only Directory that consists of a view over a compound file.

CompoundFormat

Encodes/decodes compound files

DocValuesConsumer

Abstract API that consumes numeric, binary and sorted docvalues.

DocValuesFormat

Encodes/decodes per-document values.

DocValuesProducer

Abstract API that produces numeric, binary, sorted, sortedset, and sortednumeric docvalues.

FieldInfosFormat

Encodes/decodes FieldInfos

FieldsConsumer

Abstract API that consumes terms, doc, freq, prox, offset and payloads postings.

FieldsProducer

Abstract API that produces terms, doc, freq, prox, offset and payloads postings.

FilterCodec

A codec that forwards all its method calls to another codec.

KnnVectorsFormat

Encodes/decodes per-document vector and any associated indexing structures required to support nearest-neighbor search

KnnVectorsReader

Reads vectors from an index.

KnnVectorsWriter

Writes vectors to an index.

LiveDocsFormat

Format for live/deleted documents

MultiLevelSkipListReader

This abstract class reads skip lists with multiple levels.

MultiLevelSkipListWriter

This abstract class writes skip lists with multiple levels.

MutablePointTree

One leaf PointValues.PointTree whose order of points can be changed.

NormsConsumer

Abstract API that consumes normalization values.

NormsFormat

Encodes/decodes per-document score normalization values.

NormsProducer

Abstract API that produces field normalization values

PointsFormat

Encodes/decodes indexed points.

PointsReader

Abstract API to visit point values.

PointsWriter

Abstract API to write points

PostingsFormat

Encodes/decodes terms, postings, and proximity data.

PostingsReaderBase

The core terms dictionaries (BlockTermsReader, BlockTreeTermsReader) interact with a single instance of this class to manage creation of PostingsEnum and PostingsEnum instances.

PostingsWriterBase

Class that plugs into term dictionaries, such as Lucene90BlockTreeTermsWriter, and handles writing postings.

PushPostingsWriterBase

Extension of PostingsWriterBase, adding a push API for writing each element of the postings.

SegmentInfoFormat

Expert: Controls the format of the SegmentInfo (segment metadata file).

StoredFieldsFormat

Controls the format of stored fields

StoredFieldsReader

Codec API for reading stored fields.

StoredFieldsWriter

Codec API for writing stored fields: For every document, StoredFieldsWriter.startDocument() is called, informing the Codec that a new document has started.

TermStats

Holder for per-term statistics.

TermVectorsFormat

Controls the format of term vectors

TermVectorsReader

Codec API for reading term vectors:

TermVectorsWriter

Codec API for writing term vectors: For every document, TermVectorsWriter.startDocument(int) is called, informing the Codec how many fields will be written.

Package org.apache.lucene.codecs