org.apache.lucene.codecs.lucene42
Class Lucene42TermVectorsFormat

java.lang.Object
  extended by org.apache.lucene.codecs.TermVectorsFormat
      extended by org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat
          extended by org.apache.lucene.codecs.lucene42.Lucene42TermVectorsFormat

public final class Lucene42TermVectorsFormat
extends CompressingTermVectorsFormat

Lucene 4.2 term vectors format.

Very similarly to Lucene41StoredFieldsFormat, this format is based on compressed chunks of data, with document-level granularity so that a document can never span across distinct chunks. Moreover, data is made as compact as possible:

Term vectors are stored using two files

Looking up term vectors for any document requires at most 1 disk seek.

File formats

  1. A vector data file (extension .tvd). This file stores terms, frequencies, positions, offsets and payloads for every document. Upon writing a new segment, it accumulates data into memory until the buffer used to store terms and payloads grows beyond 4KB. Then it flushes all metadata, terms and positions to disk using LZ4 compression for terms and payloads and blocks of packed ints for positions.

    Here is a more detailed description of the field data file format:

  2. An index file (extension .tvx).

WARNING: This API is experimental and might change in incompatible ways in the next release.

Constructor Summary
Lucene42TermVectorsFormat()
          Sole constructor.
 
Method Summary
 
Methods inherited from class org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat
toString, vectorsReader, vectorsWriter
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Lucene42TermVectorsFormat

public Lucene42TermVectorsFormat()
Sole constructor.



Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.