org.apache.lucene.codecs.lucene41
Class Lucene41PostingsFormat

java.lang.Object
  extended by org.apache.lucene.codecs.PostingsFormat
      extended by org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat
All Implemented Interfaces:
NamedSPILoader.NamedSPI

public final class Lucene41PostingsFormat
extends PostingsFormat

Lucene 4.1 postings format, which encodes postings in packed integer blocks for fast decode.

NOTE: this format is still experimental and subject to change without backwards compatibility.

Basic idea:

Files and detailed format:

Term Dictionary

The .tim file contains the list of terms in each field along with per-term statistics (such as docfreq) and pointers to the frequencies, positions, payload and skip data in the .doc, .pos, and .pay files. See BlockTreeTermsWriter for more details on the format.

NOTE: The term dictionary can plug into different postings implementations: the postings writer/reader are actually responsible for encoding and decoding the Postings Metadata and Term Metadata sections described here:

Notes:

Term Index

The .tip file contains an index into the term dictionary, so that it can be accessed randomly. See BlockTreeTermsWriter for more details on the format.

Frequencies and Skip Data

The .doc file contains the lists of documents which contain each term, along with the frequency of the term in that document (except when frequencies are omitted: FieldInfo.IndexOptions.DOCS_ONLY). It also saves skip data to the beginning of each packed or VInt block, when the length of document list is larger than packed block size.

Notes:

Positions

The .pos file contains the lists of positions that each term occurs at within documents. It also sometimes stores part of payloads and offsets for speedup.

Notes:

Payloads and Offsets

The .pay file will store payloads and offsets associated with certain term-document positions. Some payloads and offsets will be separated out into .pos file, for performance reasons.

Notes:

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary
static int BLOCK_SIZE
          Fixed packed block size, number of integers encoded in a single packed block.
static String DOC_EXTENSION
          Filename extension for document number, frequencies, and skip data.
static String PAY_EXTENSION
          Filename extension for payloads and offsets.
static String POS_EXTENSION
          Filename extension for positions.
 
Fields inherited from class org.apache.lucene.codecs.PostingsFormat
EMPTY
 
Constructor Summary
Lucene41PostingsFormat()
          Creates Lucene41PostingsFormat with default settings.
Lucene41PostingsFormat(int minTermBlockSize, int maxTermBlockSize)
          Creates Lucene41PostingsFormat with custom values for minBlockSize and maxBlockSize passed to block terms dictionary.
 
Method Summary
 FieldsConsumer fieldsConsumer(SegmentWriteState state)
          Writes a new segment
 FieldsProducer fieldsProducer(SegmentReadState state)
          Reads a segment.
 String toString()
           
 
Methods inherited from class org.apache.lucene.codecs.PostingsFormat
availablePostingsFormats, forName, getName, reloadPostingsFormats
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DOC_EXTENSION

public static final String DOC_EXTENSION
Filename extension for document number, frequencies, and skip data. See chapter: Frequencies and Skip Data

See Also:
Constant Field Values

POS_EXTENSION

public static final String POS_EXTENSION
Filename extension for positions. See chapter: Positions

See Also:
Constant Field Values

PAY_EXTENSION

public static final String PAY_EXTENSION
Filename extension for payloads and offsets. See chapter: Payloads and Offsets

See Also:
Constant Field Values

BLOCK_SIZE

public static final int BLOCK_SIZE
Fixed packed block size, number of integers encoded in a single packed block.

See Also:
Constant Field Values
Constructor Detail

Lucene41PostingsFormat

public Lucene41PostingsFormat()
Creates Lucene41PostingsFormat with default settings.


Lucene41PostingsFormat

public Lucene41PostingsFormat(int minTermBlockSize,
                              int maxTermBlockSize)
Creates Lucene41PostingsFormat with custom values for minBlockSize and maxBlockSize passed to block terms dictionary.

See Also:
BlockTreeTermsWriter.BlockTreeTermsWriter(SegmentWriteState,PostingsWriterBase,int,int)
Method Detail

toString

public String toString()
Overrides:
toString in class PostingsFormat

fieldsConsumer

public FieldsConsumer fieldsConsumer(SegmentWriteState state)
                              throws IOException
Description copied from class: PostingsFormat
Writes a new segment

Specified by:
fieldsConsumer in class PostingsFormat
Throws:
IOException

fieldsProducer

public FieldsProducer fieldsProducer(SegmentReadState state)
                              throws IOException
Description copied from class: PostingsFormat
Reads a segment. NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an IOException should be thrown by the implementation. IOExceptions are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.

Specified by:
fieldsProducer in class PostingsFormat
Throws:
IOException


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.