org.apache.lucene.codecs
Class TermVectorsWriter

java.lang.Object
  extended by org.apache.lucene.codecs.TermVectorsWriter
All Implemented Interfaces:
Closeable
Direct Known Subclasses:
CompressingTermVectorsWriter, Lucene40TermVectorsWriter

public abstract class TermVectorsWriter
extends Object
implements Closeable

Codec API for writing term vectors:

  1. For every document, startDocument(int) is called, informing the Codec how many fields will be written.
  2. startField(FieldInfo, int, boolean, boolean, boolean) is called for each field in the document, informing the codec how many terms will be written for that field, and whether or not positions, offsets, or payloads are enabled.
  3. Within each field, startTerm(BytesRef, int) is called for each term.
  4. If offsets and/or positions are enabled, then addPosition(int, int, int, BytesRef) will be called for each term occurrence.
  5. After all documents have been written, finish(FieldInfos, int) is called for verification/sanity-checks.
  6. Finally the writer is closed (close())

WARNING: This API is experimental and might change in incompatible ways in the next release.

Constructor Summary
protected TermVectorsWriter()
          Sole constructor.
 
Method Summary
abstract  void abort()
          Aborts writing entirely, implementation should remove any partially-written files, etc.
protected  void addAllDocVectors(Fields vectors, MergeState mergeState)
          Safe (but, slowish) default method to write every vector field in the document.
abstract  void addPosition(int position, int startOffset, int endOffset, BytesRef payload)
          Adds a term position and offsets
 void addProx(int numProx, DataInput positions, DataInput offsets)
          Called by IndexWriter when writing new segments.
abstract  void close()
           
abstract  void finish(FieldInfos fis, int numDocs)
          Called before close(), passing in the number of documents that were written.
 void finishDocument()
          Called after a doc and all its fields have been added.
 void finishField()
          Called after a field and all its terms have been added.
 void finishTerm()
          Called after a term and all its positions have been added.
abstract  Comparator<BytesRef> getComparator()
          Return the BytesRef Comparator used to sort terms before feeding to this API.
 int merge(MergeState mergeState)
          Merges in the term vectors from the readers in mergeState.
abstract  void startDocument(int numVectorFields)
          Called before writing the term vectors of the document.
abstract  void startField(FieldInfo info, int numTerms, boolean positions, boolean offsets, boolean payloads)
          Called before writing the terms of the field.
abstract  void startTerm(BytesRef term, int freq)
          Adds a term and its term frequency freq.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TermVectorsWriter

protected TermVectorsWriter()
Sole constructor. (For invocation by subclass constructors, typically implicit.)

Method Detail

startDocument

public abstract void startDocument(int numVectorFields)
                            throws IOException
Called before writing the term vectors of the document. startField(FieldInfo, int, boolean, boolean, boolean) will be called numVectorFields times. Note that if term vectors are enabled, this is called even if the document has no vector fields, in this case numVectorFields will be zero.

Throws:
IOException

finishDocument

public void finishDocument()
                    throws IOException
Called after a doc and all its fields have been added.

Throws:
IOException

startField

public abstract void startField(FieldInfo info,
                                int numTerms,
                                boolean positions,
                                boolean offsets,
                                boolean payloads)
                         throws IOException
Called before writing the terms of the field. startTerm(BytesRef, int) will be called numTerms times.

Throws:
IOException

finishField

public void finishField()
                 throws IOException
Called after a field and all its terms have been added.

Throws:
IOException

startTerm

public abstract void startTerm(BytesRef term,
                               int freq)
                        throws IOException
Adds a term and its term frequency freq. If this field has positions and/or offsets enabled, then addPosition(int, int, int, BytesRef) will be called freq times respectively.

Throws:
IOException

finishTerm

public void finishTerm()
                throws IOException
Called after a term and all its positions have been added.

Throws:
IOException

addPosition

public abstract void addPosition(int position,
                                 int startOffset,
                                 int endOffset,
                                 BytesRef payload)
                          throws IOException
Adds a term position and offsets

Throws:
IOException

abort

public abstract void abort()
Aborts writing entirely, implementation should remove any partially-written files, etc.


finish

public abstract void finish(FieldInfos fis,
                            int numDocs)
                     throws IOException
Called before close(), passing in the number of documents that were written. Note that this is intentionally redundant (equivalent to the number of calls to startDocument(int), but a Codec should check that this is the case to detect the JRE bug described in LUCENE-1282.

Throws:
IOException

addProx

public void addProx(int numProx,
                    DataInput positions,
                    DataInput offsets)
             throws IOException
Called by IndexWriter when writing new segments.

This is an expert API that allows the codec to consume positions and offsets directly from the indexer.

The default implementation calls addPosition(int, int, int, BytesRef), but subclasses can override this if they want to efficiently write all the positions, then all the offsets, for example.

NOTE: This API is extremely expert and subject to change or removal!!!

Throws:
IOException
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

merge

public int merge(MergeState mergeState)
          throws IOException
Merges in the term vectors from the readers in mergeState. The default implementation skips over deleted documents, and uses startDocument(int), startField(FieldInfo, int, boolean, boolean, boolean), startTerm(BytesRef, int), addPosition(int, int, int, BytesRef), and finish(FieldInfos, int), returning the number of documents that were written. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).

Throws:
IOException

addAllDocVectors

protected final void addAllDocVectors(Fields vectors,
                                      MergeState mergeState)
                               throws IOException
Safe (but, slowish) default method to write every vector field in the document.

Throws:
IOException

getComparator

public abstract Comparator<BytesRef> getComparator()
                                            throws IOException
Return the BytesRef Comparator used to sort terms before feeding to this API.

Throws:
IOException

close

public abstract void close()
                    throws IOException
Specified by:
close in interface Closeable
Throws:
IOException


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.