Interface | Description |
---|---|
IndexableField |
Represents a single field for indexing.
|
IndexableFieldType |
Describes the properties of a field.
|
IndexDeletionPolicy |
Expert: policy for deletion of stale
index commits . |
IndexReader.ReaderClosedListener |
A custom listener that's invoked when the IndexReader
is closed.
|
SegmentReader.CoreClosedListener |
Called when the shared core for this SegmentReader
is closed.
|
TwoPhaseCommit |
An interface for implementations that support 2-phase commit.
|
Class | Description |
---|---|
AtomicReader |
AtomicReader is an abstract class, providing an interface for accessing an
index. |
AtomicReaderContext |
IndexReaderContext for AtomicReader instances. |
BaseCompositeReader<R extends IndexReader> |
Base class for implementing
CompositeReader s based on an array
of sub-readers. |
BinaryDocValues |
A per-document byte[]
|
CheckIndex |
Basic tool and API to check the health of an index and
write a new segments file that removes reference to
problematic segments.
|
CheckIndex.Status |
Returned from
CheckIndex.checkIndex() detailing the health and status of the index. |
CheckIndex.Status.DocValuesStatus |
Status from testing DocValues
|
CheckIndex.Status.FieldNormStatus |
Status from testing field norms.
|
CheckIndex.Status.SegmentInfoStatus |
Holds the status of each segment in the index.
|
CheckIndex.Status.StoredFieldStatus |
Status from testing stored fields.
|
CheckIndex.Status.TermIndexStatus |
Status from testing term index.
|
CheckIndex.Status.TermVectorStatus |
Status from testing stored fields.
|
CompositeReader |
Instances of this reader type can only
be used to get stored fields from the underlying AtomicReaders,
but it is not possible to directly retrieve postings.
|
CompositeReaderContext |
IndexReaderContext for CompositeReader instance. |
ConcurrentMergeScheduler |
A
MergeScheduler that runs each merge using a
separate thread. |
DirectoryReader |
DirectoryReader is an implementation of
CompositeReader
that can read indexes in a Directory . |
DocsAndPositionsEnum |
Also iterates through positions.
|
DocsEnum |
Iterates through the documents and term freqs.
|
DocTermOrds |
This class enables fast access to multiple term ords for
a specified field across all docIDs.
|
FieldInfo |
Access to the Field Info file that describes document fields and whether or
not they are indexed.
|
FieldInfos |
Collection of
FieldInfo s (accessible by number or by name). |
FieldInvertState |
This class tracks the number and position / offset parameters of terms
being added to the index.
|
Fields |
Flex API for access to fields and terms
|
FilterAtomicReader |
A
FilterAtomicReader contains another AtomicReader, which it
uses as its basic source of data, possibly transforming the data along the
way or providing additional functionality. |
FilterAtomicReader.FilterDocsAndPositionsEnum |
Base class for filtering
DocsAndPositionsEnum implementations. |
FilterAtomicReader.FilterDocsEnum |
Base class for filtering
DocsEnum implementations. |
FilterAtomicReader.FilterFields |
Base class for filtering
Fields
implementations. |
FilterAtomicReader.FilterTerms |
Base class for filtering
Terms
implementations. |
FilterAtomicReader.FilterTermsEnum |
Base class for filtering
TermsEnum implementations. |
FilteredTermsEnum |
Abstract class for enumerating a subset of all terms.
|
IndexCommit |
Expert: represents a single commit into an index as seen by the
IndexDeletionPolicy or IndexReader . |
IndexFileNames |
This class contains useful constants representing filenames and extensions
used by lucene, as well as convenience methods for querying whether a file
name matches an extension (
matchesExtension ), as well as generating file names from a segment name,
generation and extension (
fileNameFromGeneration ,
segmentFileName ). |
IndexReader |
IndexReader is an abstract class, providing an interface for accessing an
index.
|
IndexReaderContext |
A struct like class that represents a hierarchical relationship between
IndexReader instances. |
IndexUpgrader |
This is an easy-to-use tool that upgrades all segments of an index from previous Lucene versions
to the current segment file format.
|
IndexWriter |
An
IndexWriter creates and maintains an index. |
IndexWriter.IndexReaderWarmer |
If
DirectoryReader.open(IndexWriter,boolean) has
been called (ie, this writer is in near real-time
mode), then after a merge completes, this class can be
invoked to warm the reader on the newly merged
segment, before the merge commits. |
IndexWriterConfig |
Holds all the configuration that is used to create an
IndexWriter . |
KeepOnlyLastCommitDeletionPolicy |
This
IndexDeletionPolicy implementation that
keeps only the most recent commit and immediately removes
all prior commits after a new commit is done. |
LiveIndexWriterConfig |
Holds all the configuration used by
IndexWriter with few setters for
settings that can be changed on an IndexWriter instance "live". |
LogByteSizeMergePolicy |
This is a
LogMergePolicy that measures size of a
segment as the total byte size of the segment's files. |
LogDocMergePolicy |
This is a
LogMergePolicy that measures size of a
segment as the number of documents (not taking deletions
into account). |
LogMergePolicy |
This class implements a
MergePolicy that tries
to merge segments into levels of exponentially
increasing size, where each level has fewer segments than
the value of the merge factor. |
MergePolicy |
Expert: a MergePolicy determines the sequence of
primitive merge operations.
|
MergePolicy.MergeSpecification |
A MergeSpecification instance provides the information
necessary to perform multiple merges.
|
MergePolicy.OneMerge |
OneMerge provides the information necessary to perform
an individual primitive merge operation, resulting in
a single new segment.
|
MergeScheduler |
Expert:
IndexWriter uses an instance
implementing this interface to execute the merges
selected by a MergePolicy . |
MergeState |
Holds common state used during segment merging.
|
MergeState.CheckAbort |
Class for recording units of work when merging segments.
|
MergeState.DocMap |
Remaps docids around deletes during merge
|
MultiDocsAndPositionsEnum |
Exposes flex API, merged from flex API of sub-segments.
|
MultiDocsAndPositionsEnum.EnumWithSlice |
Holds a
DocsAndPositionsEnum along with the
corresponding ReaderSlice . |
MultiDocsEnum | |
MultiDocsEnum.EnumWithSlice |
Holds a
DocsEnum along with the
corresponding ReaderSlice . |
MultiDocValues |
A wrapper for CompositeIndexReader providing access to DocValues.
|
MultiDocValues.MultiSortedDocValues |
Implements SortedDocValues over n subs, using an OrdinalMap
|
MultiDocValues.MultiSortedSetDocValues |
Implements MultiSortedSetDocValues over n subs, using an OrdinalMap
|
MultiDocValues.OrdinalMap |
maps per-segment ordinals to/from global ordinal space
|
MultiFields |
Exposes flex API, merged from flex API of sub-segments.
|
MultiReader |
A
CompositeReader which reads multiple indexes, appending
their content. |
MultiTerms |
Exposes flex API, merged from flex API of
sub-segments.
|
MultiTermsEnum | |
NoDeletionPolicy |
An
IndexDeletionPolicy which keeps all index commits around, never
deleting them. |
NoMergePolicy |
A
MergePolicy which never returns merges to execute (hence it's
name). |
NoMergeScheduler |
A
MergeScheduler which never executes any merges. |
NumericDocValues |
A per-document numeric value.
|
OrdTermState |
An ordinal based
TermState |
ParallelAtomicReader |
An
AtomicReader which reads multiple, parallel indexes. |
ParallelCompositeReader |
An
CompositeReader which reads multiple, parallel indexes. |
PersistentSnapshotDeletionPolicy |
A
SnapshotDeletionPolicy which adds a persistence layer so that
snapshots can be maintained across the life of an application. |
ReaderManager |
Utility class to safely share
DirectoryReader instances across
multiple threads, while periodically reopening. |
ReaderSlice |
Subreader slice from a parent composite reader.
|
ReaderUtil |
Common util methods for dealing with
IndexReader s and IndexReaderContext s. |
SegmentInfo |
Information about a segment such as it's name, directory, and files related
to the segment.
|
SegmentInfoPerCommit |
Embeds a [read-only] SegmentInfo and adds per-commit
fields.
|
SegmentInfos |
A collection of segmentInfo objects with methods for operating on
those segments in relation to the file system.
|
SegmentInfos.FindSegmentsFile |
Utility class for executing code that needs to do
something with the current segments file.
|
SegmentReader |
IndexReader implementation over a single segment.
|
SegmentReadState |
Holder class for common parameters used during read.
|
SegmentWriteState |
Holder class for common parameters used during write.
|
SerialMergeScheduler |
A
MergeScheduler that simply does each merge
sequentially, using the current thread. |
SingleTermsEnum |
Subclass of FilteredTermsEnum for enumerating a single term.
|
SingletonSortedSetDocValues |
Exposes multi-valued view over a single-valued instance.
|
SlowCompositeReaderWrapper |
This class forces a composite reader (eg a
MultiReader or DirectoryReader ) to emulate an
atomic reader. |
SnapshotDeletionPolicy |
An
IndexDeletionPolicy that wraps around any other
IndexDeletionPolicy and adds the ability to hold and later release
snapshots of an index. |
SortedDocValues |
A per-document byte[] with presorted values.
|
SortedDocValuesTermsEnum |
Implements a
TermsEnum wrapping a provided
SortedDocValues . |
SortedSetDocValues |
A per-document set of presorted byte[] values.
|
SortedSetDocValuesTermsEnum |
Implements a
TermsEnum wrapping a provided
SortedSetDocValues . |
StoredFieldVisitor |
Expert: provides a low-level means of accessing the stored field
values in an index.
|
Term |
A Term represents a word from text.
|
TermContext | |
Terms |
Access to the terms in a specific field.
|
TermsEnum |
Iterator to seek (
TermsEnum.seekCeil(BytesRef) , TermsEnum.seekExact(BytesRef,boolean) ) or step through (BytesRefIterator.next() terms to obtain frequency information (TermsEnum.docFreq() ), DocsEnum or DocsAndPositionsEnum for the current term (TermsEnum.docs(org.apache.lucene.util.Bits, org.apache.lucene.index.DocsEnum) . |
TermState |
Encapsulates all required internal state to position the associated
TermsEnum without re-seeking. |
TieredMergePolicy |
Merges segments of approximately equal size, subject to
an allowed number of segments per tier.
|
TieredMergePolicy.MergeScore |
Holds score and explanation for a single candidate
merge.
|
TwoPhaseCommitTool |
A utility for executing 2-phase commit on several objects.
|
UpgradeIndexMergePolicy |
This
MergePolicy is used for upgrading all existing segments of
an index when calling IndexWriter.forceMerge(int) . |
Enum | Description |
---|---|
FieldInfo.DocValuesType |
DocValues types.
|
FieldInfo.IndexOptions |
Controls how much information is stored in the postings lists.
|
FilteredTermsEnum.AcceptStatus |
Return value, if term should be accepted or the iteration should
END . |
IndexWriterConfig.OpenMode |
Specifies the open mode for
IndexWriter . |
MergePolicy.MergeTrigger |
MergeTrigger is passed to
MergePolicy.findMerges(MergeTrigger, SegmentInfos) to indicate the
event that triggered the merge. |
StoredFieldVisitor.Status |
Enumeration of possible return values for
StoredFieldVisitor.needsField(org.apache.lucene.index.FieldInfo) . |
TermsEnum.SeekStatus |
Represents returned result from
TermsEnum.seekCeil(org.apache.lucene.util.BytesRef, boolean) . |
Exception | Description |
---|---|
CorruptIndexException |
This exception is thrown when Lucene detects
an inconsistency in the index.
|
IndexFormatTooNewException |
This exception is thrown when Lucene detects
an index that is newer than this Lucene version.
|
IndexFormatTooOldException |
This exception is thrown when Lucene detects
an index that is too old for this Lucene version
|
IndexNotFoundException |
Signals that no index was found in the Directory.
|
MergePolicy.MergeAbortedException |
Thrown when a merge was explicity aborted because
IndexWriter.close(boolean) was called with
false . |
MergePolicy.MergeException |
Exception thrown if there are any problems while
executing a merge.
|
TwoPhaseCommitTool.CommitFailException |
Thrown by
TwoPhaseCommitTool.execute(TwoPhaseCommit...) when an
object fails to commit(). |
TwoPhaseCommitTool.PrepareCommitFailException |
Thrown by
TwoPhaseCommitTool.execute(TwoPhaseCommit...) when an
object fails to prepareCommit(). |
Fields
is the initial entry point into the
postings APIs, this can be obtained in several ways:
// access indexed fields for an index segment Fields fields = reader.fields(); // access term vector fields for a specified document Fields fields = reader.getTermVectors(docid);Fields implements Java's Iterable interface, so its easy to enumerate the list of fields:
// enumerate list of fields for (String field : fields) { // access the terms for this field Terms terms = fields.terms(field); }
Terms
represents the collection of terms
within a field, exposes some metadata and statistics,
and an API for enumeration.
// metadata about the field System.out.println("positions? " + terms.hasPositions()); System.out.println("offsets? " + terms.hasOffsets()); System.out.println("payloads? " + terms.hasPayloads()); // iterate through terms TermsEnum termsEnum = terms.iterator(null); BytesRef term = null; while ((term = termsEnum.next()) != null) { doSomethingWith(termsEnum.term()); }
TermsEnum
provides an iterator over the list
of terms within a field, some statistics about the term,
and methods to access the term's documents and
positions.
// seek to a specific term boolean found = termsEnum.seekExact(new BytesRef("foobar"), true); if (found) { // get the document frequency System.out.println(termsEnum.docFreq()); // enumerate through documents DocsEnum docs = termsEnum.docs(null, null); // enumerate through documents and positions DocsAndPositionsEnum docsAndPositions = termsEnum.docsAndPositions(null, null); }
DocsEnum
is an extension of
DocIdSetIterator
that iterates over the list of
documents for a term, along with the term frequency within that document.
int docid; while ((docid = docsEnum.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { System.out.println(docid); System.out.println(docsEnum.freq()); }
DocsAndPositionsEnum
is an extension of
DocsEnum
that additionally allows iteration
of the positions a term occurred within the document, and any additional
per-position information (offsets and payload)
int docid; while ((docid = docsAndPositionsEnum.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { System.out.println(docid); int freq = docsAndPositionsEnum.freq(); for (int i = 0; i < freq; i++) { System.out.println(docsAndPositionsEnum.nextPosition()); System.out.println(docsAndPositionsEnum.startOffset()); System.out.println(docsAndPositionsEnum.endOffset()); System.out.println(docsAndPositionsEnum.getPayload()); } }
TermsEnum.docFreq()
: Returns the number of
documents that contain at least one occurrence of the term. This statistic
is always available for an indexed term. Note that it will also count
deleted documents, when segments are merged the statistic is updated as
those deleted documents are merged away.
TermsEnum.totalTermFreq()
: Returns the number
of occurrences of this term across all documents. Note that this statistic
is unavailable (returns -1
) if term frequencies were omitted
from the index
(DOCS_ONLY
)
for the field. Like docFreq(), it will also count occurrences that appear in
deleted documents.
Terms.size()
: Returns the number of
unique terms in the field. This statistic may be unavailable
(returns -1
) for some Terms implementations such as
MultiTerms
, where it cannot be efficiently
computed. Note that this count also includes terms that appear only
in deleted documents: when segments are merged such terms are also merged
away and the statistic is then updated.
Terms.getDocCount()
: Returns the number of
documents that contain at least one occurrence of any term for this field.
This can be thought of as a Field-level docFreq(). Like docFreq() it will
also count deleted documents.
Terms.getSumDocFreq()
: Returns the number of
postings (term-document mappings in the inverted index) for the field. This
can be thought of as the sum of TermsEnum.docFreq()
across all terms in the field, and like docFreq() it will also count postings
that appear in deleted documents.
Terms.getSumTotalTermFreq()
: Returns the number
of tokens for the field. This can be thought of as the sum of
TermsEnum.totalTermFreq()
across all terms in the
field, and like totalTermFreq() it will also count occurrences that appear in
deleted documents, and will be unavailable (returns -1
) if term
frequencies were omitted from the index
(DOCS_ONLY
)
for the field.
IndexReader.maxDoc()
: Returns the number of
documents (including deleted documents) in the index.
IndexReader.numDocs()
: Returns the number
of live documents (excluding deleted documents) in the index.
IndexReader.numDeletedDocs()
: Returns the
number of deleted documents in the index.
Fields.size()
: Returns the number of indexed
fields.
Fields.getUniqueTermCount()
: Returns the number
of indexed terms, the sum of Terms.size()
across all fields.
Document statistics are available during the indexing process for an indexed field: typically
a Similarity
implementation will store some
of these values (possibly in a lossy way), into the normalization value for the document in
its Similarity.computeNorm(org.apache.lucene.index.FieldInvertState)
method.
FieldInvertState.getLength()
: Returns the number of
tokens for this field in the document. Note that this is just the number
of times that TokenStream.incrementToken()
returned
true, and is unrelated to the values in
PositionIncrementAttribute
.
FieldInvertState.getNumOverlap()
: Returns the number
of tokens for this field in the document that had a position increment of zero. This
can be used to compute a document length that discounts artificial tokens
such as synonyms.
FieldInvertState.getPosition()
: Returns the accumulated
position value for this field in the document: computed from the values of
PositionIncrementAttribute
and including
Analyzer.getPositionIncrementGap(java.lang.String)
s across multivalued
fields.
FieldInvertState.getOffset()
: Returns the total
character offset value for this field in the document: computed from the values of
OffsetAttribute
returned by
TokenStream.end()
, and including
Analyzer.getOffsetGap(java.lang.String)
s across multivalued
fields.
FieldInvertState.getUniqueTermCount()
: Returns the number
of unique terms encountered for this field in the document.
FieldInvertState.getMaxTermFrequency()
: Returns the maximum
frequency across all unique terms encountered for this field in the document.
Additional user-supplied statistics can be added to the document as DocValues fields and
accessed via AtomicReader.getNumericDocValues(java.lang.String)
.
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.