Package org.apache.lucene.index

Code to maintain and access indices.

See:
          Description

Interface Summary
IndexableField Represents a single field for indexing.
IndexableFieldType Describes the properties of a field.
IndexReader.ReaderClosedListener A custom listener that's invoked when the IndexReader is closed.
SegmentReader.CoreClosedListener Called when the shared core for this SegmentReader is closed.
TwoPhaseCommit An interface for implementations that support 2-phase commit.
 

Class Summary
AtomicReader AtomicReader is an abstract class, providing an interface for accessing an index.
AtomicReaderContext IndexReaderContext for AtomicReader instances.
BaseCompositeReader<R extends IndexReader> Base class for implementing CompositeReaders based on an array of sub-readers.
BinaryDocValues A per-document byte[]
CheckIndex Basic tool and API to check the health of an index and write a new segments file that removes reference to problematic segments.
CheckIndex.Status Returned from CheckIndex.checkIndex() detailing the health and status of the index.
CheckIndex.Status.DocValuesStatus Status from testing DocValues
CheckIndex.Status.FieldNormStatus Status from testing field norms.
CheckIndex.Status.SegmentInfoStatus Holds the status of each segment in the index.
CheckIndex.Status.StoredFieldStatus Status from testing stored fields.
CheckIndex.Status.TermIndexStatus Status from testing term index.
CheckIndex.Status.TermVectorStatus Status from testing stored fields.
CompositeReader Instances of this reader type can only be used to get stored fields from the underlying AtomicReaders, but it is not possible to directly retrieve postings.
CompositeReaderContext IndexReaderContext for CompositeReader instance.
ConcurrentMergeScheduler A MergeScheduler that runs each merge using a separate thread.
DirectoryReader DirectoryReader is an implementation of CompositeReader that can read indexes in a Directory.
DocsAndPositionsEnum Also iterates through positions.
DocsEnum Iterates through the documents and term freqs.
DocTermOrds This class enables fast access to multiple term ords for a specified field across all docIDs.
FieldInfo Access to the Field Info file that describes document fields and whether or not they are indexed.
FieldInfos Collection of FieldInfos (accessible by number or by name).
FieldInvertState This class tracks the number and position / offset parameters of terms being added to the index.
Fields Flex API for access to fields and terms
FilterAtomicReader A FilterAtomicReader contains another AtomicReader, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality.
FilterAtomicReader.FilterDocsAndPositionsEnum Base class for filtering DocsAndPositionsEnum implementations.
FilterAtomicReader.FilterDocsEnum Base class for filtering DocsEnum implementations.
FilterAtomicReader.FilterFields Base class for filtering Fields implementations.
FilterAtomicReader.FilterTerms Base class for filtering Terms implementations.
FilterAtomicReader.FilterTermsEnum Base class for filtering TermsEnum implementations.
FilterDirectoryReader A FilterDirectoryReader wraps another DirectoryReader, allowing implementations to transform or extend it.
FilterDirectoryReader.StandardReaderWrapper A no-op SubReaderWrapper that simply returns the parent DirectoryReader's original subreaders.
FilterDirectoryReader.SubReaderWrapper Factory class passed to FilterDirectoryReader constructor that allows subclasses to wrap the filtered DirectoryReader's subreaders.
FilteredTermsEnum Abstract class for enumerating a subset of all terms.
IndexCommit Expert: represents a single commit into an index as seen by the IndexDeletionPolicy or IndexReader.
IndexDeletionPolicy Expert: policy for deletion of stale index commits.
IndexFileNames This class contains useful constants representing filenames and extensions used by lucene, as well as convenience methods for querying whether a file name matches an extension (matchesExtension), as well as generating file names from a segment name, generation and extension ( fileNameFromGeneration, segmentFileName).
IndexReader IndexReader is an abstract class, providing an interface for accessing an index.
IndexReaderContext A struct like class that represents a hierarchical relationship between IndexReader instances.
IndexUpgrader This is an easy-to-use tool that upgrades all segments of an index from previous Lucene versions to the current segment file format.
IndexWriter An IndexWriter creates and maintains an index.
IndexWriter.IndexReaderWarmer If DirectoryReader.open(IndexWriter,boolean) has been called (ie, this writer is in near real-time mode), then after a merge completes, this class can be invoked to warm the reader on the newly merged segment, before the merge commits.
IndexWriterConfig Holds all the configuration that is used to create an IndexWriter.
KeepOnlyLastCommitDeletionPolicy This IndexDeletionPolicy implementation that keeps only the most recent commit and immediately removes all prior commits after a new commit is done.
LiveIndexWriterConfig Holds all the configuration used by IndexWriter with few setters for settings that can be changed on an IndexWriter instance "live".
LogByteSizeMergePolicy This is a LogMergePolicy that measures size of a segment as the total byte size of the segment's files.
LogDocMergePolicy This is a LogMergePolicy that measures size of a segment as the number of documents (not taking deletions into account).
LogMergePolicy This class implements a MergePolicy that tries to merge segments into levels of exponentially increasing size, where each level has fewer segments than the value of the merge factor.
MergePolicy Expert: a MergePolicy determines the sequence of primitive merge operations.
MergePolicy.DocMap A map of doc IDs.
MergePolicy.MergeSpecification A MergeSpecification instance provides the information necessary to perform multiple merges.
MergePolicy.OneMerge OneMerge provides the information necessary to perform an individual primitive merge operation, resulting in a single new segment.
MergeScheduler Expert: IndexWriter uses an instance implementing this interface to execute the merges selected by a MergePolicy.
MergeState Holds common state used during segment merging.
MergeState.CheckAbort Class for recording units of work when merging segments.
MergeState.DocMap Remaps docids around deletes during merge
MultiDocsAndPositionsEnum Exposes flex API, merged from flex API of sub-segments.
MultiDocsAndPositionsEnum.EnumWithSlice Holds a DocsAndPositionsEnum along with the corresponding ReaderSlice.
MultiDocsEnum Exposes DocsEnum, merged from DocsEnum API of sub-segments.
MultiDocsEnum.EnumWithSlice Holds a DocsEnum along with the corresponding ReaderSlice.
MultiDocValues A wrapper for CompositeIndexReader providing access to DocValues.
MultiDocValues.MultiSortedDocValues Implements SortedDocValues over n subs, using an OrdinalMap
MultiDocValues.MultiSortedSetDocValues Implements MultiSortedSetDocValues over n subs, using an OrdinalMap
MultiDocValues.OrdinalMap maps per-segment ordinals to/from global ordinal space
MultiFields Exposes flex API, merged from flex API of sub-segments.
MultiReader A CompositeReader which reads multiple indexes, appending their content.
MultiTerms Exposes flex API, merged from flex API of sub-segments.
MultiTermsEnum Exposes TermsEnum API, merged from TermsEnum API of sub-segments.
NoDeletionPolicy An IndexDeletionPolicy which keeps all index commits around, never deleting them.
NoMergePolicy A MergePolicy which never returns merges to execute (hence it's name).
NoMergeScheduler A MergeScheduler which never executes any merges.
NumericDocValues A per-document numeric value.
OrdTermState An ordinal based TermState
ParallelAtomicReader An AtomicReader which reads multiple, parallel indexes.
ParallelCompositeReader An CompositeReader which reads multiple, parallel indexes.
PersistentSnapshotDeletionPolicy A SnapshotDeletionPolicy which adds a persistence layer so that snapshots can be maintained across the life of an application.
ReaderManager Utility class to safely share DirectoryReader instances across multiple threads, while periodically reopening.
ReaderSlice Subreader slice from a parent composite reader.
ReaderUtil Common util methods for dealing with IndexReaders and IndexReaderContexts.
SegmentCommitInfo Embeds a [read-only] SegmentInfo and adds per-commit fields.
SegmentInfo Information about a segment such as it's name, directory, and files related to the segment.
SegmentInfos A collection of segmentInfo objects with methods for operating on those segments in relation to the file system.
SegmentInfos.FindSegmentsFile Utility class for executing code that needs to do something with the current segments file.
SegmentReader IndexReader implementation over a single segment.
SegmentReadState Holder class for common parameters used during read.
SegmentWriteState Holder class for common parameters used during write.
SerialMergeScheduler A MergeScheduler that simply does each merge sequentially, using the current thread.
SimpleMergedSegmentWarmer A very simple merged segment warmer that just ensures data structures are initialized.
SingleTermsEnum Subclass of FilteredTermsEnum for enumerating a single term.
SingletonSortedSetDocValues Exposes multi-valued view over a single-valued instance.
SlowCompositeReaderWrapper This class forces a composite reader (eg a MultiReader or DirectoryReader) to emulate an atomic reader.
SnapshotDeletionPolicy An IndexDeletionPolicy that wraps any other IndexDeletionPolicy and adds the ability to hold and later release snapshots of an index.
SortedDocValues A per-document byte[] with presorted values.
SortedSetDocValues A per-document set of presorted byte[] values.
StoredFieldVisitor Expert: provides a low-level means of accessing the stored field values in an index.
Term A Term represents a word from text.
TermContext Maintains a IndexReader TermState view over IndexReader instances containing a single term.
Terms Access to the terms in a specific field.
TermsEnum Iterator to seek (TermsEnum.seekCeil(BytesRef), TermsEnum.seekExact(BytesRef)) or step through (BytesRefIterator.next() terms to obtain frequency information (TermsEnum.docFreq()), DocsEnum or DocsAndPositionsEnum for the current term (TermsEnum.docs(org.apache.lucene.util.Bits, org.apache.lucene.index.DocsEnum).
TermState Encapsulates all required internal state to position the associated TermsEnum without re-seeking.
TieredMergePolicy Merges segments of approximately equal size, subject to an allowed number of segments per tier.
TieredMergePolicy.MergeScore Holds score and explanation for a single candidate merge.
TrackingIndexWriter Class that tracks changes to a delegated IndexWriter, used by ControlledRealTimeReopenThread to ensure specific changes are visible.
TwoPhaseCommitTool A utility for executing 2-phase commit on several objects.
UpgradeIndexMergePolicy This MergePolicy is used for upgrading all existing segments of an index when calling IndexWriter.forceMerge(int).
 

Enum Summary
FieldInfo.DocValuesType DocValues types.
FieldInfo.IndexOptions Controls how much information is stored in the postings lists.
FilteredTermsEnum.AcceptStatus Return value, if term should be accepted or the iteration should END.
IndexWriterConfig.OpenMode Specifies the open mode for IndexWriter.
MergePolicy.MergeTrigger MergeTrigger is passed to MergePolicy.findMerges(MergeTrigger, SegmentInfos) to indicate the event that triggered the merge.
StoredFieldVisitor.Status Enumeration of possible return values for StoredFieldVisitor.needsField(org.apache.lucene.index.FieldInfo).
TermsEnum.SeekStatus Represents returned result from TermsEnum.seekCeil(org.apache.lucene.util.BytesRef).
 

Exception Summary
CorruptIndexException This exception is thrown when Lucene detects an inconsistency in the index.
IndexFormatTooNewException This exception is thrown when Lucene detects an index that is newer than this Lucene version.
IndexFormatTooOldException This exception is thrown when Lucene detects an index that is too old for this Lucene version
IndexNotFoundException Signals that no index was found in the Directory.
MergePolicy.MergeAbortedException Thrown when a merge was explicity aborted because IndexWriter.close(boolean) was called with false.
MergePolicy.MergeException Exception thrown if there are any problems while executing a merge.
TwoPhaseCommitTool.CommitFailException Thrown by TwoPhaseCommitTool.execute(TwoPhaseCommit...) when an object fails to commit().
TwoPhaseCommitTool.PrepareCommitFailException Thrown by TwoPhaseCommitTool.execute(TwoPhaseCommit...) when an object fails to prepareCommit().
 

Package org.apache.lucene.index Description

Code to maintain and access indices.

Table Of Contents

  1. Postings APIs
  2. Index Statistics

Postings APIs

Fields

Fields is the initial entry point into the postings APIs, this can be obtained in several ways:

// access indexed fields for an index segment
Fields fields = reader.fields();
// access term vector fields for a specified document
Fields fields = reader.getTermVectors(docid);
Fields implements Java's Iterable interface, so its easy to enumerate the list of fields:
// enumerate list of fields
for (String field : fields) {
  // access the terms for this field
  Terms terms = fields.terms(field);
}

Terms

Terms represents the collection of terms within a field, exposes some metadata and statistics, and an API for enumeration.

// metadata about the field
System.out.println("positions? " + terms.hasPositions());
System.out.println("offsets? " + terms.hasOffsets());
System.out.println("payloads? " + terms.hasPayloads());
// iterate through terms
TermsEnum termsEnum = terms.iterator(null);
BytesRef term = null;
while ((term = termsEnum.next()) != null) {
  doSomethingWith(termsEnum.term());
}
TermsEnum provides an iterator over the list of terms within a field, some statistics about the term, and methods to access the term's documents and positions.
// seek to a specific term
boolean found = termsEnum.seekExact(new BytesRef("foobar"));
if (found) {
  // get the document frequency
  System.out.println(termsEnum.docFreq());
  // enumerate through documents
  DocsEnum docs = termsEnum.docs(null, null);
  // enumerate through documents and positions
  DocsAndPositionsEnum docsAndPositions = termsEnum.docsAndPositions(null, null);
}

Documents

DocsEnum is an extension of DocIdSetIteratorthat iterates over the list of documents for a term, along with the term frequency within that document.

int docid;
while ((docid = docsEnum.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
  System.out.println(docid);
  System.out.println(docsEnum.freq());
}

Positions

DocsAndPositionsEnum is an extension of DocsEnum that additionally allows iteration of the positions a term occurred within the document, and any additional per-position information (offsets and payload)

int docid;
while ((docid = docsAndPositionsEnum.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
  System.out.println(docid);
  int freq = docsAndPositionsEnum.freq();
  for (int i = 0; i < freq; i++) {
     System.out.println(docsAndPositionsEnum.nextPosition());
     System.out.println(docsAndPositionsEnum.startOffset());
     System.out.println(docsAndPositionsEnum.endOffset());
     System.out.println(docsAndPositionsEnum.getPayload());
  }
}

Index Statistics

Term statistics

Field statistics

Segment statistics

Document statistics

Document statistics are available during the indexing process for an indexed field: typically a Similarity implementation will store some of these values (possibly in a lossy way), into the normalization value for the document in its Similarity.computeNorm(org.apache.lucene.index.FieldInvertState) method.

Additional user-supplied statistics can be added to the document as DocValues fields and accessed via AtomicReader.getNumericDocValues(java.lang.String).



Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.