All Classes and Interfaces
Class
Description
Abstract parent class for analysis factories
TokenizerFactory
, TokenFilterFactory
and CharFilterFactory
.AbstractKnnCollector is the default implementation for a knn collector used for gathering kNN
results and providing topDocs from the gathered neighbors
Base implementation for
PagedMutable
and PagedGrowableWriter
.An object whose RAM usage can be computed.
Helper methods for constructing nested resource descriptions and debugging RAM usage.
This class acts as the base class for the implementations of the first normalization of the
informative content in the DFR framework.
Model of the information gain based on the ratio of two Bernoulli processes.
Model of the information gain based on Laplace's law of succession.
This exception is thrown when there is an attempt to access something that has already been
closed.
Helper class for loading named SPIs from classpath (e.g.
An Analyzer builds TokenStreams, which analyze text.
Strategy defining how TokenStreamComponents are reused per call to
Analyzer.tokenStream(String, java.io.Reader)
.This class encapsulates the outer components of a token stream.
Extension to
Analyzer
suitable for Analyzers which wrap other Analyzers.Methods for manipulating arrays.
Comparator for a fixed number of bytes.
Base interface for attributes.
An AttributeFactory creates instances of
AttributeImpl
s.Expert: AttributeFactory returning an instance of the given
clazz
for the
attributes it implements.Base class for Attributes that can be added to a
AttributeSource
.This interface is used to reflect contents of
AttributeSource
or AttributeImpl
.An AttributeSource contains a list of different
AttributeImpl
s, and methods to add and
get them.This class holds the state of an AttributeSource.
Construction of basic automata.
Represents an automaton and all its states and transitions.
Records new states and transitions and then
Automaton.Builder.finish()
creates the Automaton
.Automaton provider for
RegExp.
RegExp.toAutomaton(AutomatonProvider,int)
A
Query
that will match terms against a finite-state machine.A FilteredTermsEnum that enumerates terms based upon what is accepted by a DFA.
Converts an Automaton into a TokenStream.
Axiomatic approaches for IR.
F1EXP is defined as Sum(tf(term_doc_freq)*ln(docLen)*IDF(term)) where IDF(t) = pow((N+1)/df(t),
k) N=total num of docs, df=doc freq
F1LOG is defined as Sum(tf(term_doc_freq)*ln(docLen)*IDF(term)) where IDF(t) = ln((N+1)/df(t))
N=total num of docs, df=doc freq
F2EXP is defined as Sum(tfln(term_doc_freq, docLen)*IDF(term)) where IDF(t) = pow((N+1)/df(t), k)
N=total num of docs, df=doc freq
F2EXP is defined as Sum(tfln(term_doc_freq, docLen)*IDF(term)) where IDF(t) = ln((N+1)/df(t))
N=total num of docs, df=doc freq
F3EXP is defined as Sum(tf(term_doc_freq)*IDF(term)-gamma(docLen, queryLen)) where IDF(t) =
pow((N+1)/df(t), k) N=total num of docs, df=doc freq gamma(docLen, queryLen) =
(docLen-queryLen)*queryLen*s/avdl NOTE: the gamma function of this similarity creates negative
scores
F3EXP is defined as Sum(tf(term_doc_freq)*IDF(term)-gamma(docLen, queryLen)) where IDF(t) =
ln((N+1)/df(t)) N=total num of docs, df=doc freq gamma(docLen, queryLen) =
(docLen-queryLen)*queryLen*s/avdl NOTE: the gamma function of this similarity creates negative
scores
Base class for implementing
CompositeReader
s based on an array of sub-readers.Base implementation for a concrete
Directory
that uses a LockFactory
for locking.A base TermsEnum that adds default implementations for
BaseTermsEnum.attributes()
BaseTermsEnum.termState()
BaseTermsEnum.seekExact(BytesRef)
BaseTermsEnum.seekExact(BytesRef, TermState)
In some cases, the default implementation may be slow and consume huge memory, so subclass SHOULD
have its own implementation if possible.This class acts as the base class for the specific basic model implementations in the
DFR framework.
Geometric as limiting form of the Bose-Einstein model.
An approximation of the I(ne) model.
The basic tf-idf model of randomness.
Tf-idf model of randomness, based on a mixture of Poisson and inverse document frequency.
Stores all statistics commonly used ranking methods.
A per-document numeric value.
Field that stores a per-document
BytesRef
value.An indexed binary field for fast range filters.
A binary representation of a range that wraps a BinaryDocValues field
Bit mixing utilities.
Interface for Bitset-like structures.
Bits impl of the specified length with all bits set.
Bits impl of the specified length with no bits set.
Base implementation for a bit set.
A
DocIdSetIterator
which iterates over set bits in a bit set.A variety of high efficiency bit twiddling routines and encoders for primitives.
Basic parameters for indexing points on the BKD tree.
Offline Radix selector for BKD tree.
Sliced reference to points in an PointWriter.
Handles reading a block KD-tree in byte[] space previously written with
BKDWriter
.Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and
smaller N-dim rectangles (cells) until the number of points in a given rectangle is <=
config.maxPointsInLeafNode
.A
Query
that blends index statistics across multiple terms.A Builder for
BlendedTermQuery
.A
BlendedTermQuery.RewriteMethod
that creates a DisjunctionMaxQuery
out of the sub queries.A
BlendedTermQuery.RewriteMethod
defines how queries for individual terms should be merged.Reader for sequences of longs written with
BlockPackedWriter
.A writer for large sequences of longs.
Holds all state required for
PostingsReaderBase
to produce a PostingsEnum
without re-seeking the terms dict.BM25 Similarity.
A clause in a BooleanQuery.
Specifies how clauses are to occur in matching documents.
A Query that matches documents matching boolean combinations of other queries, e.g.
A builder for boolean queries.
Deprecated.
Simple similarity that gives terms a score that is equal to their query boost.
Add this
Attribute
to a TermsEnum
returned by MultiTermQuery.getTermsEnum(Terms,AttributeSource)
and update the boost on each returned term.Implementation class for
BoostAttribute
.A
Query
wrapper that allows to give a boost to the wrapped query.Wraps another
Checksum
with an internal buffer to speed up checksum calculations.Simple implementation of
ChecksumIndexInput
that wraps another input and delegates calls.Base implementation class for buffered
IndexInput
.Buffers up pending vector value(s) per doc, then flushes when segment flushes.
This class is used to score a range of documents at once, and is returned by
Weight.bulkScorer(org.apache.lucene.index.LeafReaderContext)
.DataInput backed by a byte array.
DataOutput backed by a byte array.
This class enables the allocation of fixed-size buffers and their management as part of a buffer
array.
Abstract class for allocating and freeing byte blocks.
A simple
ByteBlockPool.Allocator
that never recycles.A simple
ByteBlockPool.Allocator
that never recycles, but tracks how much total RAM is in use.Base IndexInput implementation that uses an array of ByteBuffers to represent a file.
A
DataOutput
storing data in a list of ByteBuffer
s.An implementation of a
ByteBuffer
allocation and recycling policy.A
ByteBuffer
-based Directory
implementation that can be used to store index files
on the heap.An
IndexOutput
writing to a ByteBuffersDataOutput
.Automaton representation for matching UTF-8 byte[].
An FST
Outputs
implementation where each output is a sequence of bytes.Represents byte[], as a slice (offset + length) into an existing byte[].
A simple append only random-access
BytesRef
array that stores full copies of the appended
bytes in a ByteBlockPool
.An extension of
BytesRefIterator
that allows retrieving the index of the current
elementUsed to iterate the elements of an array in a given order.
Represents a logical list of ByteRef backed by a
ByteBlockPool
.A builder for
BytesRef
instances.Specialized
BytesRef
comparator that StringSorter
has optimizations for.Enumerates all input (BytesRef) + output pairs in an FST.
Holds a single input (BytesRef) + output pair.
BytesRefHash
is a special purpose hash-map like data-structure optimized for BytesRef
instances.Manages allocation of the per-term addresses.
A simple
BytesRefHash.BytesStartArray
that tracks memory allocation using a private Counter
instance.A simple iterator interface for
BytesRef
iteration.This attribute can be used if you have the raw term bytes to be indexed.
Implementation class for
BytesTermAttribute
.This class provides access to per-document floating point vector values indexed as
KnnByteVectorField
.Caches all docs, and optionally also scores, coming from a search, and is then able to replay
them to another collector.
This class can be used if the token attributes of a TokenStream are intended to be consumed more
than once.
Automaton representation for matching char[].
Utility class to write tokenizers or token filters.
A simple IO buffer to use with
CharacterUtils.fill(CharacterBuffer, Reader)
.A simple class that stores key Strings as char[]'s in a hash table.
A simple class that stores Strings as char[]'s in a hash table.
Subclasses of CharFilter can be chained to filter a Reader They can be used as
Reader
with additional offset correction.Abstract parent class for analysis factories that create
CharFilter
instances.An FST
Outputs
implementation where each output is a sequence of characters.Represents char[], as a slice (offset + length) into an existing char[].
A builder for
CharsRef
instances.The term text of a Token.
Default implementation of
CharTermAttribute
.Like
IntConsumer
, but may throw checked exceptions.Basic tool and API to check the health of an index and write a new segments file that removes
reference to problematic segments.
The marker RuntimeException used by CheckIndex APIs when index integrity failure is detected.
Run-time configuration options for CheckIndex commands.
Returned from
CheckIndex.checkIndex()
detailing the health and status of the index.Status from testing DocValues
Status from testing field infos.
Status from testing field norms.
Status from testing index sort
Status from testing livedocs
Status from testing PointValues
Holds the status of each segment in the index.
Status from testing soft deletes
Status from testing stored fields.
Status from testing term index.
Status from testing stored fields.
Status from testing vector values
Walks the entire N-dimensional points space, verifying that all points fall within the last
cell's boundaries.
Extension of IndexInput, computing checksum as it goes.
Represents a circle on the earth's surface.
Expert: Historical scoring implementation.
Helper class used by ServiceLoader to investigate parent/child relationships of
ClassLoader
s.Simple
ResourceLoader
that uses ClassLoader.getResourceAsStream(String)
and
Class.forName(String,boolean,ClassLoader)
to open resources and classes, respectively.A supplier that creates
RandomVectorScorer
from an ordinal.Java's builtin ThreadLocal has a serious flaw: it can take an arbitrarily long amount of time to
dereference the things you had stored in it, even once the ThreadLocal instance itself is no
longer referenced.
Encodes/decodes an inverted index segment.
LeafReader implemented by codec APIs.
Utility class for reading and writing versioned headers.
Contains statistics for a collection (field).
Throw this exception in
LeafCollector.collect(int)
to prematurely terminate collection of
the current leaf.Methods for manipulating (sorting) and creating collections.
Expert: Collectors are primarily meant to be used to gather raw results from a search, and
implement sorting or custom result filtering, collation, etc.
A manager of collectors.
Class containing some useful methods used by command line tools
This class accumulates the (freq, norm) pairs that may produce competitive scores.
Immutable class holding compiled details for a given Automaton.
Automata are compiled into different internal forms for the most efficient execution depending
upon the language they accept.
2D Geometry object that supports spatial relationships with bounding boxes, triangles and points.
Used by withinTriangle to check the within relationship between a triangle and the query shape
(e.g.
Instances of this reader type can only be used to get stored fields from the underlying
LeafReaders, but it is not possible to directly retrieve postings.
IndexReaderContext
for CompositeReader
instance.A read-only
Directory
that consists of a view over a compound file.Encodes/decodes compound files
A compression mode.
A data compressor.
This merger merges graph in a concurrent manner, by using
HnswConcurrentMergeBuilder
A
MergeScheduler
that runs each merge using a separate thread.Access to
ConcurrentMergeScheduler
internals exposed to the test
framework.Helper methods for building conjunction iterators
Some useful constants.
A query that wraps another query and simply returns a constant score equal to 1 for every
document that matches the query.
We return this as our
BulkScorer
so that if the CSQ wraps a query with its own
optimized top-level scorer (e.g.A constant-scoring
Scorer
.A Weight that has a constant score equal to the boost of the wrapped query.
Utility class that runs a thread to manage periodicc reopens of a
ReferenceManager
, with
methods to wait for a specific index changes to become visible.This exception is thrown when Lucene detects an inconsistency in the index.
Simple counter class
Deprecated.
Visibility of this class will be reduced in a future release.
Abstract base class for performing read operations of Lucene's low-level data types.
Abstract base class for performing write operations of Lucene's low-level data types.
Provides support for converting dates to strings and vice-versa.
Specifies the time granularity.
A decompressor.
A compression mode that trades speed for compression ratio.
An analyzer wrapper, that doesn't allow to wrap components or readers.
Implements the Divergence from Independence (DFI) model based on Chi-square statistics
(i.e., standardized Chi-squared distance from independence in term frequency tf).
Implements the divergence from randomness (DFR) framework introduced in Gianni Amati and
Cornelis Joost Van Rijsbergen.
Retrieves an instance previously written by
DirectMonotonicWriter
.In-memory metadata that needs to be kept around for
DirectMonotonicReader
to read data
from disk.Write monotonically-increasing sequences of integers.
A
Directory
provides an abstraction layer for storing a list of files.DirectoryReader is an implementation of
CompositeReader
that can read indexes in a Directory
.Retrieves an instance previously written by
DirectWriter
Class for writing packed integers to be directly read from Directory.
A priority queue of DocIdSetIterators that orders by current doc ID.
Wrapper used in
DisiPriorityQueue
.A
DocIdSetIterator
which is a disjunction of the approximations of the provided
iterators.A query that generates the union of documents produced by its subqueries, and that scores each
document with the maximum score for that document as produced by any subquery, plus a tie
breaking increment for any additional matching subqueries.
The probabilistic distribution used to model term occurrence in information-based models.
Log-logistic distribution.
The smoothed power-law (SPL) distribution for the information-based framework that is described
in the original paper.
A
DocIdSetIterator
like BitSetIterator
but has a doc base in onder to avoid
storing previous 0s.Comparator that sorts by asc _doc
Utility class to help merging documents from sub-readers according to either simple concatenated
(unsorted) order, or by a specified index-time sort, skipping deleted documents and remapping
non-deleted documents.
Represents one sub-reader being merged
A DocIdSet contains a set of doc ids.
A builder of
DocIdSet
s.Utility class to efficiently add many docs in one go.
This abstract class defines methods to iterate over a set of non-decreasing doc ids.
A stream of doc IDs.
Accumulator for documents that have a value for a field.
Documents are the unit of indexing and search.
A
StoredFieldVisitor
that creates a Document
from stored fields.This class contains utility methods and constants for DocValues
Abstract API that consumes numeric, binary and sorted docvalues.
Deprecated.
Use
FieldExistsQuery
instead.Encodes/decodes per-document values.
Abstract API that produces numeric, binary, sorted, sortedset, and sortednumeric docvalues.
Rewrites MultiTermQueries into a filter, using DocValues for term enumeration.
DocValues types.
Comparator based on
Double.compare(double, double)
for numHits
.Syntactic sugar for encoding doubles as NumericDocValues via
Double.doubleToRawLongBits(double)
.Field that stores a per-document
double
value for scoring, sorting or value
retrieval and index the field for fast range filters.An indexed
double
field for fast range filters.An indexed Double Range field.
DocValues field for DoubleRange.
Per-segment, per-document double values, which can be calculated at search-time
Base class for producing
DoubleValues
Abstract base class implementing a
DocValuesProducer
that has no doc values.Expert: Find exact phrases
The
ExitableDirectoryReader
wraps a real index DirectoryReader
and allows for a
QueryTimeout
implementation object to be checked periodically to see if the thread should
exit or not.Wrapper class for another FilterAtomicReader.
Wrapper class for a SubReaderWrapper that is used by the ExitableDirectoryReader.
Wrapper class for another Terms implementation that is used by ExitableFields.
Wrapper class for TermsEnum that is used by ExitableTerms for implementing an exitable
enumeration of terms.
Exception that is thrown to prematurely terminate a term enumeration.
Expert: Describes the score computation for document and query.
Field
that can be used to store static scoring factors into documents.Expert: directly create a field for a document.
Specifies whether and how a field should be stored.
Expert: a FieldComparator compares hits so as to determine their sort order when collecting the
top results with
TopFieldCollector
.Sorts by descending relevance.
Sorts by field's natural Term sort order.
Provides a
FieldComparator
for custom field sorting.Expert: A ScoreDoc which also contains information about how to sort the referenced document.
A
Query
that matches documents that contain either a KnnFloatVectorField
, KnnByteVectorField
or a field that indexes norms or doc values.Access to the Field Info file that describes document fields and whether or not they are indexed.
Collection of
FieldInfo
s (accessible by number or by name).Encodes/decodes
FieldInfos
This class tracks the number and position / offset parameters of terms being added to the index.
BlockTree's implementation of
Terms
.Provides a
Terms
index for fields that have it, and lists which fields do.Abstract API that consumes terms, doc, freq, prox, offset and payloads postings.
Efficient index format for block-based
Codec
s.Abstract API that produces terms, doc, freq, prox, offset and payloads postings.
Describes the properties of a field.
Expert: A hit queue for sorting by hits by terms in more than one field.
Extension of ScoreDoc to also store the
FieldComparator
slot.This class provides ability to track the reference counts of a set of index files and delete them
when their counts decreased to 0.
Types of messages this file deleter will broadcast REF: messages about reference FILE: messages
about file
Tracks the reference count for a single index file:
Expert: A Directory instance that switches files between two other Directory instances.
Delegates all methods to a wrapped
BinaryDocValues
.A codec that forwards all its method calls to another codec.
A
FilterCodecReader
contains another CodecReader, which it uses as its basic source
of data, possibly transforming the data along the way or providing additional functionality.Collector
delegator.Directory implementation that delegates calls to another directory.
A FilterDirectoryReader wraps another DirectoryReader, allowing implementations to transform or
extend it.
A DelegatingCacheHelper is a CacheHelper specialization for implementing long-lived caching
behaviour for FilterDirectoryReader subclasses.
Factory class passed to FilterDirectoryReader constructor that allows subclasses to wrap the
filtered DirectoryReader's subreaders.
Abstract decorator class of a DocIdSetIterator implementation that provides on-demand
filter/validation mechanism on an underlying DocIdSetIterator.
Abstract class for enumerating a subset of all terms.
Return value, if term should be accepted or the iteration should
END
.IndexInput implementation that delegates calls to another directory.
IndexOutput implementation that delegates calls to another directory.
Abstract base class for TokenFilters that may remove tokens.
An
Iterator
implementation that filters elements with a boolean predicate.LeafCollector
delegator.A
FilterLeafReader
contains another LeafReader, which it uses as its basic source of
data, possibly transforming the data along the way or providing additional functionality.Base class for filtering
Fields
implementations.Base class for filtering
PostingsEnum
implementations.Base class for filtering
Terms
implementations.Base class for filtering
TermsEnum
implementations.A MatchesIterator that delegates all calls to another MatchesIterator
A wrapper for
MergePolicy
instances.Delegates all methods to a wrapped
NumericDocValues
.Filter a
Scorable
, intercepting methods and optionally changing their return valuesA
FilterScorer
contains another Scorer
, which it uses as its basic source of
data, possibly transforming the data along the way or providing additional functionality.Delegates all methods to a wrapped
SortedDocValues
.Delegates all methods to a wrapped
SortedNumericDocValues
.Delegates all methods to a wrapped
SortedSetDocValues
.Delegates all methods to a wrapped
FloatVectorValues
.A
FilterWeight
contains another Weight
and implements all abstract methods by
calling the contained weight's method.Iterates all accepted strings.
BitSet of fixed length (numBits), backed by accessible (
FixedBitSet.getBits()
) long[], accessed with
an int index, implementing Bits
and DocIdSet
.This attribute can be used to pass different flags down the
Tokenizer
chain, e.g.Default implementation of
FlagsAttribute
.Vectors' writer for a field
Encodes/decodes per-document vectors
Reads vectors from an index.
Vectors' writer for a field that allows additional indexing logic to be implemented by the caller
Comparator based on
Float.compare(float, float)
for numHits
.Syntactic sugar for encoding floats as NumericDocValues via
Float.floatToRawIntBits(float)
.Field that stores a per-document
float
value for scoring, sorting or value retrieval
and index the field for fast range filters.An indexed
float
field for fast range filters.An indexed Float Range field.
DocValues field for FloatRange.
This class provides access to per-document floating point vector values indexed as
KnnFloatVectorField
.A FlushInfo provides information required for a FLUSH context.
Utility class to encode/decode increasing sequences of 128 integers.
A ring buffer that tracks the frequency of the integers that it contains.
Base class for Directory implementations that store index files in the file system.
Base class for file system based locking implementation.
Represents an finite state machine (FST), using a compact byte[] format.
Represents a single arc.
Reads bytes stored in an FST.
Represent the FST metadata
Specifies allowed range of each int input label for this FST.
Builds a minimal FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with
outputs.
Fluent-style constructor for FST
FSTCompiler
.Abstraction for reading bytes necessary for FST.
A type of
FSTReader
which needs data to be initialized before useImplements the fuzzy search query.
Subclass of TermsEnum for enumerating all terms that are similar to the specified filter term.
Thrown to indicate that there was an issue creating a fuzzy query for a given term.
reusable geopoint encoding methods
A predicate that checks whether a given point is within a component2D geometry.
A predicate that checks whether a given point is within a distance of another point.
Base class for
LatLonGeometry
and XYGeometry
Basic reusable geo-spatial utility methods
used to define the orientation of 3 points -1 = Clockwise 0 = Colinear 1 = Counter-clockwise
An abstract TokenFilter that exposes its input stream as a graph
Consumes a TokenStream and creates an
Automaton
where the transition labels are terms
from the TermToBytesRefAttribute
.Decode integers using group-varint.
Encode integers using group-varint.
Implements
PackedInts.Mutable
, but grows the bit count of the underlying packed ints
on-demand.Utility class to read buffered points from in-heap arrays.
Utility class to write new points into in-heap arrays.
Expert: Priority queue containing hit docs
Interface for builder building the
OnHeapHnswGraph
A graph builder that manages multiple workers, it only supports adding the whole graph all at
once.
Hierarchical Navigable Small World graph.
NodesIterator that accepts nodes as an integer array.
Nodes iterator based on set representation of nodes.
Iterator over the graph nodes on a certain level, Iterator also provides the size – the total
number of nodes to be iterated over.
Builder for HNSW graph.
A restricted, specialized knnCollector that can be used when building a graph.
Abstraction of merging multiple graphs into one on-heap graph
An interface that provides an HNSW graph.
Searches an HNSW graph to find nearest neighbors to a query vector.
Provides a framework for the family of information-based models, as described in Stéphane
Clinchant and Eric Gaussier.
Annotation to not test a class or constructor with
TestRandomChains
integration test.Per-document scoring factors.
Information about upcoming impacts, ie.
DocIdSetIterator
that skips non-competitive docs thanks to the indexed impacts.Extension of
PostingsEnum
which also provides information about upcoming impacts.Source of
Impacts
.This selects the biggest Hnsw graph from the provided merge state and initializes a new
HnswGraphBuilder with that graph as a starting point.
Computes the measure of divergence from independence for DFI scoring functions.
Normalized chi-squared measure of distance from independence
Saturated measure of distance from independence
Standardized measure of distance from independence
Represents a single field for indexing.
Describes the properties of a field.
Expert: represents a single commit into an index as seen by the
IndexDeletionPolicy
or
IndexReader
.Expert: policy for deletion of stale
index commits
.Disk-based implementation of a
DocIdSetIterator
which can return the index of the current
document, i.e.This class contains useful constants representing filenames and extensions used by lucene, as
well as convenience methods for querying whether a file name matches an extension (
matchesExtension
), as well as generating file names from a
segment name, generation and extension ( fileNameFromGeneration
, segmentFileName
).This exception is thrown when Lucene detects an index that is newer than this Lucene version.
This exception is thrown when Lucene detects an index that is too old for this Lucene version
Abstract base class for input from a file in a
Directory
.Signals that no index was found in the Directory.
Controls how much information is stored in the postings lists.
A query that uses either an index structure (points or terms) or doc values in order to run a
query, depending which one is more efficient.
A
DataOutput
for appending data to a file in a Directory
.Access to
org.apache.lucene.index
package internals exposed to the test framework.Public type exposing
FieldInfo
internal builders.IndexReader is an abstract class, providing an interface for accessing a point-in-time view of an
index.
A utility class that gives hooks in order to help build a cache based on the data that is
contained in this index.
A cache key identifying a resource that is being cached on.
A listener that is called when a resource gets closed.
A struct like class that represents a hierarchical relationship between
IndexReader
instances.Implements search over a single IndexReader.
A class holding a subset of the
IndexSearcher
s leaf contexts to be executed within a
single thread.Thrown when an attempt is made to add more than
IndexSearcher.TooManyClauses.getMaxClauseCount()
clauses.Thrown when a client attempts to execute a Query that has more than
IndexSearcher.TooManyClauses.getMaxClauseCount()
total clauses cumulatively in all of it's children.Handles how documents should be sorted in an index, both within a segment and between segments.
Used for sorting documents across segments
A comparator of doc IDs, used for sorting documents within a segment
Sorts documents based on double values from a NumericDocValues instance
Sorts documents based on float values from a NumericDocValues instance
Sorts documents based on integer values from a NumericDocValues instance
Sorts documents based on long values from a NumericDocValues instance
Provide a NumericDocValues instance for a LeafReader
Provide a SortedDocValues instance for a LeafReader
Sorts documents based on terms from a SortedDocValues instance
A range query that can take advantage of the fact that the index is sorted to speed up execution.
This is an easy-to-use tool that upgrades all segments of an index from previous Lucene versions
to the current segment file format.
An
IndexWriter
creates and maintains an index.DocStats for this index
If
DirectoryReader.open(IndexWriter)
has been called (ie, this writer is in near
real-time mode), then after a merge completes, this class can be invoked to warm the reader on
the newly merged segment, before the merge commits.Access to
IndexWriter
internals exposed to the test framework.Holds all the configuration that is used to create an
IndexWriter
.Specifies the open mode for
IndexWriter
.A callback event listener for recording key events happened inside IndexWriter
A Query that matches documents matching combinations of subqueries.
Combines scores of subscorers.
The Weight for IndriAndQuery, used to normalize, score and explain these queries.
Bayesian smoothing using Dirichlet priors as implemented in the Indri Search engine
(http://www.lemurproject.org/indri.php).
Models
p(w|C)
as the number of occurrences of the term in the collection, divided by
the total number of tokens + 1
.The Indri implemenation of a disjunction scorer which stores the subscorers for the child
queries.
A Basic abstract query that all IndriQueries can extend to implement toString, equals,
getClauses, and iterator.
The Indri parent scorer that stores the boost so that IndriScorers can use the boost outside of
the term.
An indexed 128-bit
InetAddress
field.An indexed InetAddress Range Field
Debugging API for Lucene classes such as
IndexWriter
and SegmentInfos
.This creates a graph builder that is initialized with the provided HnswGraph.
Sorter
implementation based on the merge-sort algorithm that merges in place (no extra
memory will be allocated).A
DataInput
wrapping a plain InputStream
.A pool for int blocks similar to
ByteBlockPool
Abstract class for allocating and freeing int blocks.
A simple
IntBlockPool.Allocator
that never recycles.Comparator based on
Integer.compare(int, int)
for numHits
.Field that stores a per-document
int
value for scoring, sorting or value retrieval
and index the field for fast range filters.A hash map of
int
to int
, implemented using open addressing with linear
probing for collision resolution.Simplifies the implementation of iterators a bit.
BufferAllocationException forked from HPPC
An indexed
int
field for fast range filters.An indexed Integer Range field.
DocValues field for IntRange.
Adaptive selection algorithm based on the introspective quick select algorithm.
An FST
Outputs
implementation where each output is a sequence of ints.Represents int[], as a slice (offset + length) into an existing int[].
A builder for
IntsRef
instances.Enumerates all input (IntsRef) + output pairs in an FST.
Holds a single input (IntsRef) + output pair.
Native int to int function
Describes how an
IndexableField
should be inverted for indexing terms and postings.An IO operation with a single input that may throw an IOException.
IOContext holds additional details on the merge/search context.
Context is a enumerator which specifies the context in which the Directory is being used for.
A Function that may throw an IOException
This is a result supplier that is allowed to throw an IOException.
Utilities for dealing with
Closeable
s.Deprecated, for removal: This API element is subject to removal in a future version.
was replaced by
IOConsumer
.Deprecated, for removal: This API element is subject to removal in a future version.
was replaced by
IOFunction
.InfoStream implementation that logs every message using Java Utils Logging (JUL) with the
supplied log level.
This
IndexDeletionPolicy
implementation that keeps only the most recent commit and
immediately removes all prior commits after a new commit is done.This attribute can be used to mark a token as a keyword.
Default implementation of
KeywordAttribute
.Field that indexes a per-document String or
BytesRef
into an inverted index for fast
filtering, stores values in a columnar fashion using DocValuesType.SORTED_SET
doc values
for sorting and faceting, and optionally stores values as stored fields for top-hits retrieval.A field that contains a single byte numeric vector (or none) for each document.
Uses
KnnVectorsReader.search(String, byte[], KnnCollector, Bits)
to perform nearest
neighbour search.KnnCollector is a knn collector used for gathering kNN results and providing topDocs from the
gathered neighbors
Vectors' writer for a field
A field that contains a single floating-point numeric vector (or none) for each document.
Uses
KnnVectorsReader.search(String, float[], KnnCollector, Bits)
to perform nearest
neighbour search.Deprecated.
use
KnnFloatVectorField
insteadDeprecated.
Use
FieldExistsQuery
instead.Deprecated.
use
KnnFloatVectorQuery
insteadEncodes/decodes per-document vector and any associated indexing structures required to support
nearest-neighbor search
Reads vectors from an index.
Writes vectors to an index.
View over multiple vector values supporting iterator-style access via DocIdMerger.
The lambda (λw) parameter in information-based models.
Computes lambda as
docFreq+1 / numberOfDocuments+1
.Computes lambda as
totalTermFreq+1 / numberOfDocuments+1
.An per-document location field.
Lat/Lon Geometry object.
An indexed location field.
An geo shape utility class for indexing and searching gis geometries whose vertices are latitude,
longitude values (in decimal degrees).
A concrete implementation of
ShapeDocValues
for storing binary doc value representation
of LatLonShape
geometries in a LatLonShapeDocValuesField
Concrete implementation of a
ShapeDocValuesField
for geographic geometries.Collector decouples the score from the collected doc: the score computation is skipped entirely
if it's not needed.
Expert: comparator that gets instantiated on each leaf from a top-level
FieldComparator
instance.Provides read-only metadata about a leaf.
LeafReader
is an abstract class, providing an interface for accessing an index.IndexReaderContext
for LeafReader
instances.Similarity.SimScorer
on a specific LeafReader
.Class to construct DFAs that match a word within some edit distance.
FiniteStringsIterator
which limits the number of iterated accepted strings.Represents a line on the earth's surface.
Format for live/deleted documents
Tracks live field values across NRT reader reopens.
Holds all the configuration used by
IndexWriter
with few setters for settings that can be
changed on an IndexWriter
instance "live".Bayesian smoothing using Dirichlet priors.
Language model based on the Jelinek-Mercer smoothing method.
Abstract superclass for language modeling Similarities.
A strategy for computing the collection language model.
Models
p(w|C)
as the number of occurrences of the term in the collection, divided by
the total number of tokens + 1
.Stores the collection distribution of the current term.
An interprocess mutex lock.
Base class for Locking implementation.
This exception is thrown when the
write.lock
could not be acquired.This exception is thrown when the
write.lock
could not be released.Simple standalone tool that forever acquires and releases a lock using a specific
LockFactory
.This class makes a best-effort check that a provided
Lock
is valid before any destructive
filesystem operation.Simple standalone server that must be running when you use
VerifyingLockFactory
.This is a
LogMergePolicy
that measures size of a segment as the total byte size of the
segment's files.This is a
LogMergePolicy
that measures size of a segment as the number of documents (not
taking deletions into account).This class implements a
MergePolicy
that tries to merge segments into levels of
exponentially increasing size, where each level has fewer segments than the value of the merge
factor.BitSet of fixed length (numBits), backed by accessible (
LongBitSet.getBits()
) long[], accessed with a
long index.Comparator based on
Long.compare(long, long)
for numHits
.Field that stores a per-document
long
value for scoring, sorting or value retrieval
and index the field for fast range filters.A min heap that stores longs; a primitive priority queue that like all priority queues maintains
a partial ordering of its elements such that the least element can always be found in constant
time.
An indexed
long
field for fast range filters.An indexed Long Range field.
DocValues field for LongRange.
Represents long[], as a slice (offset + length) into an existing long[].
Per-segment, per-document long values, which can be calculated at search-time
Abstraction over an array of longs.
Base class for producing
LongValues
A ConstantLongValuesSource that always returns a constant value
Utility class that can efficiently compress arrays that mostly contain characters in the
[0x1F,0x3F) or [0x5F,0x7F) ranges, which notably include all digits, lowercase characters, '.',
'-' and '_'.
Normalizes token text to lower case.
A
QueryCache
that evicts queries using a LRU (least-recently-used) eviction policy in
order to remain under a given maximum size and number of bytes used.Cache of doc ids with a count.
A LSB Radix sorter for unsigned int values.
A block-based terms index and dictionary that assigns terms to variable length blocks according
to how they share prefixes.
Block-based terms index and dictionary writer.
Lucene 9.0 compound file format
A
StoredFieldsFormat
that compresses documents in chunks in order to improve the
compression ratio.A
TermVectorsFormat
that compresses chunks of documents together in order to improve the
compression ratio.Lucene 9.0 DocValues format.
Lucene 9.0 live docs format
Lucene 9.0 Score normalization format.
Lucene 9.0 point format, which encodes dimensional values in a block KD-tree structure for fast
1D range and N dimensional shape intersection filtering.
Reads point values previously written with
Lucene90PointsWriter
Writes dimensional values
Lucene 9.0 stored fields format.
Configuration option for stored fields.
Lucene 9.0
term vectors format
.Lucene 9.0 Field Infos format.
Implements the Lucene 9.9 index format
Configuration option for the codec.
Lucene 9.9 flat vector format, which encodes numeric vector values
Reads vectors from the index segments.
Writes vector values to index segments.
Lucene 9.9 vector format, which encodes numeric vector values into an associated graph connecting
the documents having values.
Lucene 9.9 vector format, which encodes numeric vector values into an associated graph connecting
the documents having values.
Reads vectors from the index segments along with index data structures supporting KNN search.
Writes vector values and knn graphs to index segments.
Lucene 9.9 postings format, which encodes postings in packed integer blocks for fast decode.
Holds all state required for
Lucene99PostingsReader
to produce a PostingsEnum
without re-seeking the terms dict.Concrete class that reads docId(maybe frq,pos,offset,payloads) list with postings format.
Concrete class that writes docId(maybe frq,pos,offset,payloads) list with postings format.
Format supporting vector quantization, storage, and retrieval
Reads Scalar Quantized vectors from the index segments along with index data structures.
Writes quantized vector values and metadata to index segments.
Lucene 9.9 Segment info format.
Implements the skip list reader for block postings format that stores positions and payloads.
Write skip lists with multiple levels, and support skip within block ints.
LZ4 compression and decompression routines.
Simple lossy
LZ4.HashTable
that only stores the last ocurrence for each hash on
2^14
bytes of memory.A higher-precision
LZ4.HashTable
.A compression mode that compromises on the compression ratio to provide fast compression and
decompression.
Helper class for keeping Lists of Objects associated with keys.
A
Fields
implementation that merges multiple Fields into one, and maps around deleted
documents.A query that matches all documents.
Reports the positions and optionally offsets of all matching terms in a query for a single
document
An iterator over match positions (and optionally offsets) for a single document and field
Contains static functions that aid the implementation of
Matches
and MatchesIterator
interfaces.Computes which segments have identical field name to number mappings, which allows stored fields
and term vectors in this codec to be bulk-merged.
A query that matches no documents.
Math static utility methods.
Add this
Attribute
to a fresh AttributeSource
before calling MultiTermQuery.getTermsEnum(Terms,AttributeSource)
.Implementation class for
MaxNonCompetitiveBoostAttribute
.Compute maximum scores based on
Impacts
and keep them in a cache in order not to run
expensive similarity score computations multiple times on the same data.Provides a merged sorted view from several sorted iterators.
A MergeInfo provides information required for a MERGE context.
Expert: a MergePolicy determines the sequence of primitive merge operations.
Thrown when a merge was explicitly aborted because
IndexWriter.abortMerges()
was called.This interface represents the current context of the merge selection process.
Exception thrown if there are any problems while executing a merge.
A MergeSpecification instance provides the information necessary to perform multiple merges.
OneMerge provides the information necessary to perform an individual primitive merge operation,
resulting in a single new segment.
Progress and state for an executing merge.
Reason for pausing the merge thread.
This is the
RateLimiter
that IndexWriter
assigns to each running merge, to give
MergeScheduler
s ionice like control.Expert:
IndexWriter
uses an instance implementing this interface to execute the merges
selected by a MergePolicy
.Provides access to new merges and executes the actual merge
Holds common state used during segment merging.
A map of doc IDs.
MergeTrigger is passed to
MergePolicy.findMerges(MergeTrigger, SegmentInfos, MergePolicy.MergeContext)
to indicate the event that triggered the merge.Docs iterator that starts iterating from a configurable minimum document
Operations for minimizing automata.
File-based
Directory
implementation that uses mmap for reading, and FSDirectory.FSIndexOutput
for writing.Simple
ResourceLoader
that uses Module.getResourceAsStream(String)
and Class.forName(Module,String)
to open resources and classes, respectively.Provides random access to a stream written with
MonotonicBlockPackedWriter
.A writer for large monotonically increasing sequences of positive longs.
Radix sorter for variable-length strings.
Concatenates multiple Bits together, on every lookup.
A
CollectorManager
implements which wrap a set of CollectorManager
as MultiCollector
acts for Collector
.A wrapper for CompositeIndexReader providing access to DocValues.
Implements SortedDocValues over n subs, using an OrdinalMap
Implements MultiSortedSetDocValues over n subs, using an OrdinalMap
Provides a single
Fields
term index view over an IndexReader
.Utility methods for working with a
IndexReader
as if it were a LeafReader
.This abstract class reads skip lists with multiple levels.
This abstract class writes skip lists with multiple levels.
A generalized version of
PhraseQuery
, with the possibility of adding more than one term
at the same position that are treated as a disjunction (OR).A builder for multi-phrase queries
Slower version of UnionPostingsEnum that delegates offsets and positions, for use by
MatchesIterator
Takes the logical union of multiple PostingsEnum iterators.
Exposes
PostingsEnum
, merged from PostingsEnum
API of sub-segments.Holds a
PostingsEnum
along with the corresponding ReaderSlice
.A
CompositeReader
which reads multiple indexes, appending their content.A
Multiset
is a set that allows for duplicate elements.Implements the CombSUM method for combining evidence from multiple similarity values described
in: Joseph A.
An abstract
Query
that matches documents containing a subset of terms provided by a
FilteredTermsEnum
enumeration.Abstract class that defines how the query is rewritten.
A rewrite method that first translates each term into
BooleanClause.Occur.SHOULD
clause
in a BooleanQuery, but adjusts the frequencies used for scoring to be blended across the terms,
otherwise the rarest term typically ranks highest (often not useful eg in the set of expanded
terms in a FuzzyQuery).A rewrite method that first translates each term into
BooleanClause.Occur.SHOULD
clause
in a BooleanQuery, but the scores are only computed as the boost.A rewrite method that first translates each term into
BooleanClause.Occur.SHOULD
clause
in a BooleanQuery, and keeps the scores as computed by the query.Exposes flex API, merged from flex API of sub-segments.
One leaf
PointValues.PointTree
whose order of points can be changed.Utility APIs for sorting and partitioning buffered points.
Base class for all mutable values.
MutableValue
implementation of type boolean
.MutableValue
implementation of type Date
.MutableValue
implementation of type double
.MutableValue
implementation of type float
.MutableValue
implementation of type int
.MutableValue
implementation of type long
.MutableValue
implementation of type String
.Utility class to help extract the set of sub queries that have matched from a larger query.
Helper class for loading named SPIs from classpath (e.g.
Interface to support
NamedSPILoader.lookup(String)
by name.A default
ThreadFactory
implementation that accepts the name prefix of the created
threads as a constructor argument.Implements
LockFactory
using native OS file locks.NeighborArray encodes the neighbors of a node and their mutual scores in the HNSW graph as a pair
of growable arrays.
NeighborQueue uses a
LongHeap
to store lists of arcs in an HNSW graph, represented as a
neighbor node id with an associated score packed together as a sortable long, which is sorted
primarily by score.This is a
PhraseQuery
which is optimized for n-gram phrase query.An
FSDirectory
implementation that uses java.nio's FileChannel's positional read, which
allows multiple threads to read from the same file without synchronizing.An
IndexDeletionPolicy
which keeps all index commits around, never deleting them.Use this
LockFactory
to disable locking entirely.A
MergePolicy
which never returns merges to execute.A
MergeScheduler
which never executes any merges.A null FST
Outputs
implementation; use this if you just want to build an FSA.This class acts as the base class for the implementations of the term frequency normalization
methods in the DFR framework.
Implementation used when there is no normalization.
Normalization model that assumes a uniform distribution of the term frequency.
Normalization model in which the term frequency is inversely related to the length.
Dirichlet Priors normalization
Pareto-Zipf Normalization
Abstract API that consumes normalization values.
Deprecated.
Use
FieldExistsQuery
instead.Encodes/decodes per-document score normalization values.
Abstract API that produces field normalization values
Wraps a RAM-resident directory around any provided delegate directory, to be used during NRT
search.
Abstract numeric comparator for comparing numeric values.
A per-document numeric value.
Field that stores a per-document
long
value for scoring, sorting or value retrieval.Helper APIs to encode numeric values as sortable bytes and vice-versa.
Read the vector values from the index input.
Dense vector values that are stored off-heap.
Read the vector values from the index input.
Dense vector values that are stored off-heap.
Provides off heap storage of finite state machine (FST), using underlying index input instead of
byte store on heap
Reads points from disk in a fixed-with format, previously written with
OfflinePointWriter
.Writes points to disk in a fixed-with format.
On-disk sorting of byte arrays.
A bit more descriptive unit for constructors.
Utility class to read length-prefixed byte[] entries from an input.
Utility class to emit length-prefixed byte[] entries to an output stream for sorting.
The start and end character offset of a Token.
Default implementation of
OffsetAttribute
.A wrapping merge policy that wraps the
MergePolicy.OneMerge
objects returned by the wrapped merge policy.Provides storage of finite state machine (FST), using byte array or byte store allocated on heap.
An
HnswGraph
where all nodes and connections are held in memory.Automata operations.
Maps per-segment ordinals to/from global ordinal space, using a compact packed-ints
representation.
Wraps a provided KnnCollector object, translating the provided vectorId ordinal to a documentId
An ordinal based
TermState
Configuration for
DirectMonotonicReader
and IndexedDISI
for reading sparse
vectors.Represents the outputs for an FST, providing the basic algebra required for building and
traversing the FST.
A
DataOutput
wrapping a plain OutputStream
.Implementation class for buffered
IndexOutput
that writes to an OutputStream
.A
DataInput
wrapper to read unaligned, variable-length packed integers.A
DataOutput
wrapper to write unaligned, variable-length packed integers.Simplistic compression for array of unsigned long values.
A decoder for packed integers.
An encoder for packed integers.
A format to write packed ints.
Simple class that holds a format and a number of bits per value.
A packed integer array that can be modified.
A
PackedInts.Reader
which has all its values equal to 0 (bitsPerValue = 0).A read-only random access array of positive integers.
Run-once iterator interface, to decode previously saved PackedInts.
A write-once Writer.
Utility class to compress integers into a
LongValues
instance.A Builder for a
PackedLongValues
instance.Default implementation of the common attributes used by Lucene:
CharTermAttribute
TypeAttribute
PositionIncrementAttribute
PositionLengthAttribute
OffsetAttribute
TermFrequencyAttribute
Represents a logical byte[] as a series of pages.
Provides methods to read BytesRefs from a frozen PagedBytes.
A
PagedMutable
.An FST
Outputs
implementation, holding two other outputs.Holds a single pair of two outputs.
An
CompositeReader
which reads multiple, parallel indexes.An
LeafReader
which reads multiple, parallel indexes.The payload of a Token.
Default implementation of
PayloadAttribute
.Enables per field docvalues support.
Enables per field numeric vector support.
VectorReader that can wrap multiple delegate readers, selected by field.
Enables per field postings support.
Provides the ability to use a different
Similarity
for different fields.A
SnapshotDeletionPolicy
which adds a persistence layer so that snapshots can be
maintained across the life of an application.Base class for exact and sloppy phrase matching
A Query that matches documents containing a particular sequence of terms.
A builder for phrase queries.
Term postings and position information for phrase matching
Expert: Weight class for phrase matching
Represents a point on the earth's surface.
Abstract query class to find all documents whose single or multi-dimensional point values,
previously indexed with e.g.
Iterator of encoded point values.
Abstract class for range queries against single or multidimensional points such as
IntPoint
.One pass iterator through all points previously written with a
PointWriter
, abstracting
away whether points are read from (offline) disk or simple arrays in heap.Encodes/decodes indexed points.
Abstract API to visit point values.
Abstract API to write points
Represents a dimensional point value written in the BKD tree.
Access to indexed numeric values.
We recurse the
PointValues.PointTree
, using a provided instance of this to guide the recursion.Basic operations to read the KD-tree.
Used by
PointValues.intersect(org.apache.lucene.index.PointValues.IntersectVisitor)
to check how each recursive cell corresponds to the query.Appends many points, and then at the end provides a
PointReader
to iterate those points.Represents a closed polygon on the earth's surface.
Determines the position of this token relative to the previous Token in a TokenStream, used in
phrase searching.
Default implementation of
PositionIncrementAttribute
.Determines how many positions this token spans.
Default implementation of
PositionLengthAttribute
.An FST
Outputs
implementation where each output is a non-negative long value.Iterates through the postings.
Encodes/decodes terms, postings, and proximity data.
The core terms dictionaries (BlockTermsReader, BlockTreeTermsReader) interact with a single
instance of this class to manage creation of
PostingsEnum
and
PostingsEnum
instances.Class that plugs into term dictionaries, such as
Lucene90BlockTreeTermsWriter
, and
handles writing postings.Prefix codes term instances (prefixes are shared).
Builds a PrefixCodedTerms: call add repeatedly, then finish.
An iterator over the list of terms stored in a
PrefixCodedTerms
.A Query that matches documents containing terms with a specified prefix.
InfoStream implementation over a
PrintStream
such as System.out
.A priority queue maintains a partial ordering of its elements such that the least element can
always be found in constant time.
Controls
LeafFieldComparator
how to skip documentsExtension of
PostingsWriterBase
, adding a push API for writing each element of the
postings.The abstract base class for queries.
Creates queries from the
Analyzer
chain.Wraps a term and boost
A cache for queries.
A policy defining which filters should be cached.
A
Rescorer
that uses a provided Query to assign scores to the first-pass hits.Query timeout abstraction that controls whether a query should continue or be stopped.
An implementation of
QueryTimeout
that can be used by the ExitableDirectoryReader
class to time out and exit out when a query takes a long time to rewrite.Allows recursion through a query tree
Radix selector.
Estimates the size (memory representation) of Java objects.
Random Access Index API.
Provides random access to vectors by dense ordinal.
A
RandomVectorScorer
for scoring random nodes in batches against an abstract query.Creates a default scorer for random access vectors.
A supplier that creates
RandomVectorScorer
from an ordinal.RandomVectorScorerSupplier for bytes vector
RandomVectorScorerSupplier for Float vector
Query class for searching
RangeField
types by a defined PointValues.Relation
.Used by
RangeFieldQuery
to check how each internal or leaf node relates to the query.Abstract base class to rate limit IO.
Simple class to rate limit IO.
Utility class to safely share
DirectoryReader
instances across multiple threads, while
periodically reopening.Subreader slice from a parent composite reader.
Common util methods for dealing with
IndexReader
s and IndexReaderContext
s.Represents a lat/lon rectangle.
A
ByteBlockPool.Allocator
implementation that recycles unused byte blocks in a buffer and
reuses them in subsequent calls to RecyclingByteBlockAllocator.getByteBlock()
.A
IntBlockPool.Allocator
implementation that recycles unused int blocks in a buffer and reuses them in
subsequent calls to RecyclingIntBlockAllocator.getIntBlock()
.Manages reference counting for a given object.
Utility class to safely share instances of a certain type across multiple threads, while
periodically refreshing them.
Use to receive notification when a refresh has finished.
Regular Expression extension to
Automaton
.The type of expression represented by a RegExp node.
A fast regular expression query based on the
org.apache.lucene.util.automaton
package.Re-scores the topN results (
TopDocs
) from an original query.Abstraction for loading resources (streams, files, and classes).
Interface for a component that needs to be initialized by an implementation of
ResourceLoader
.DocIdSet
implementation inspired from http://roaringbitmap.org/A builder of
RoaringDocIdSet
s.Acts like forever growing T[], but internally uses a circular buffer to reuse instances of T.
Implement to reset an instance
Finite-state automaton with fast run operation.
An
ExecutorService
that executes tasks immediately in the calling thread during submit.Calculates and adjust the scores correctly for quantized vectors given the scalar quantization
parameters
Calculates dot product on quantized vectors, applying the appropriate corrections
Calculates euclidean distance on quantized vectors, applying the appropriate corrections
Calculates max inner product on quantized vectors, applying the appropriate corrections
Will scalar quantize float vectors into `int8` byte values.
Allows access to the score of a Query
A child Scorer and its relationship to its parent.
A
Scorer
which wraps another scorer and caches the score of the current document.Holds one hit in
TopDocs
.Different modes of search.
Expert: Common scoring functionality for different types of queries.
A supplier of
Scorer
.Base rewrite method that translates each term into a query, and keeps the scores as computed by
the query.
Factory class used by
SearcherManager
to create new IndexSearchers.Keeps track of current plus old IndexSearchers, closing the old ones once they have timed out.
Simple pruner that drops any searcher older by more than the specified seconds, than the newest
searcher.
Utility class to safely share
IndexSearcher
instances across multiple threads, while
periodically reopening.Interface defining whether or not an object can be cached against a
LeafReader
Embeds a [read-only] SegmentInfo and adds per-commit fields.
Information about a segment such as its name, directory, and files related to the segment.
Expert: Controls the format of the
SegmentInfo
(segment metadata file).A collection of segmentInfo objects with methods for operating on those segments in relation to
the file system.
Utility class for executing code that needs to do something with the current segments file.
IndexReader implementation over a single segment.
Access to
SegmentReader
internals exposed to the test framework.Holder class for common parameters used during read.
Holder class for common parameters used during write.
An implementation of a selection algorithm, ie.
This attribute tracks what sentence a given token belongs to as well as potentially other
sentence specific attributes.
Default implementation of
SentenceAttribute
.A native int hash-based set where one value is reserved to mean "EMPTY" internally.
A
MergeScheduler
that simply does each merge sequentially, using the current thread.A convenient class which offers a semi-immutable object wrapper implementation which allows one
to set the value of an object exactly once, and retrieve it many times.
Thrown when
SetOnce.set(Object)
is called more than once.A doc values field for
LatLonShape
and XYShape
that uses ShapeDocValues
as the underlying binary doc value format.A base shape utility class used for both LatLon (spherical) and XY (cartesian) shape fields.
Represents a encoded triangle using
ShapeField.decodeTriangle(byte[], DecodedTriangle)
.type of triangle
Query Relation Types *
polygons are decomposed into tessellated triangles using
Tessellator
these triangles are encoded and inserted as separate indexed
POINT fieldsSimilarity defines the components of Lucene scoring.
Stores the weight for a query across the indexed collection.
A subclass of
Similarity
that provides a simplified API for its descendants.Base
Collector
implementation that is used to collect all contexts.Base
FieldComparator
implementation that is used for all contexts.A very simple merged segment warmer that just ensures data structures are initialized.
Parses shape geometry represented in WKT format
Enumerated type for Shapes
Implements
LockFactory
for a single in-process instance, meaning all locking will take
place through this one instance.Subclass of FilteredTermsEnum for enumerating a single term.
Directory that wraps another, and that sleeps and retries if obtaining the lock fails.
Math functions that trade off accuracy for speed.
Find all slop-valid position-combinations (matches) encountered while traversing/hopping the
PhrasePositions.
Wraps arbitrary readers for merging.
ImpactsEnum
that doesn't index impacts but implements the API in a legal way.Floating point numbers smaller than 32 bits.
An
IndexDeletionPolicy
that wraps any other IndexDeletionPolicy
and adds the
ability to hold and later release snapshots of an index.This reader filters out documents that have a doc values value in the given field and treat these
documents as soft deleted.
This
MergePolicy
allows to carry over soft deleted documents across merges.Encapsulates sort criteria for returned hits.
A per-document byte[] with presorted values.
Field that stores a per-document
BytesRef
value, indexed for sorting.A list of per-document numeric values, sorted according to
Long.compare(long, long)
.Field that stores a per-document
long
values for scoring, sorting or value
retrieval.Selects a value from the document's list to use as the representative value
Type of selection to perform.
SortField for
SortedNumericDocValues
.A SortFieldProvider for this sort field
A multi-valued version of
SortedDocValues
.Field that stores a set of per-document
BytesRef
values, indexed for
faceting,grouping,joining.Selects a value from the document's set to use as the representative value
Type of selection to perform.
SortField for
SortedSetDocValues
.A SortFieldProvider for this sort
Sorts documents of a given index by returning a permutation on the document IDs.
Base class for sorting algorithms implementations.
A permutation of doc IDs.
Stores information about how to sort documents by terms in an individual field.
A SortFieldProvider for field sorts
Specifies the type of the terms to be sorted, or special types such as CUSTOM
Reads/Writes a named SortField from a segment info file, used to record index sorts
An
CodecReader
which supports sorting documents by a given Sort
.A
Rescorer
that re-sorts according to a provided Sort.A bit set that only stores longs that have at least one bit which is set.
Stable radix sorter for variable-length strings.
A MergeSorter taking advantage of temporary storage.
Filters
StandardTokenizer
with LowerCaseFilter
and StopFilter
, using a
configurable list of stop words.Default implementation of
DirectoryReader
.A grammar-based tokenizer constructed with JFlex.
Factory for
StandardTokenizer
.This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29.
Pair of states.
BlockTree statistics for a single field returned by
FieldReader.getStats()
.Removes stop words from a token stream.
Base class for Analyzers that need to make use of stopword sets.
A field whose value is stored so that
IndexSearcher.storedFields()
and IndexReader.storedFields()
will return the field and its value.API for reading stored fields.
Controls the format of stored fields
Codec API for reading stored fields.
Codec API for writing stored fields:
For every document,
StoredFieldsWriter.startDocument()
is called, informing the Codec that a new
document has started.Expert: provides a low-level means of accessing the stored field values in an index.
Enumeration of possible return values for
StoredFieldVisitor.needsField(org.apache.lucene.index.FieldInfo)
.Abstraction around a stored value.
Type of a
StoredValue
.A field that is indexed but not tokenized: the entire String value is indexed as a single token.
Methods for manipulating strings.
A
BytesRef
sorter tries to use a efficient radix sorter if StringSorter.cmp
is a
BytesRefComparator
, otherwise fallback to StringSorter.fallbackSorter(java.util.Comparator<org.apache.lucene.util.BytesRef>)
Annotation to suppress forbidden-apis errors inside a whole class, a method, or a field.
A query that treats multiple terms as synonyms.
A builder for
SynonymQuery
.Executor wrapper responsible for the execution of concurrent tasks.
A Term represents a word from text.
Word2Vec unit composed by a term with the associated vector
Sets the custom term frequency of a term within one document.
Default implementation of
TermFrequencyAttribute
.Specialization for a disjunction over many terms that, by default, behaves like a
ConstantScoreQuery
over a BooleanQuery
containing only BooleanClause.Occur.SHOULD
clauses.Sorts by field's natural Term sort order, using ordinals.
A Query that matches documents containing a term.
A Query that matches documents within an range of terms.
Access to the terms in a specific field.
Expert: A
Scorer
for documents matching a Term
.Iterator to seek (
TermsEnum.seekCeil(BytesRef)
, TermsEnum.seekExact(BytesRef)
) or step through
(BytesRefIterator.next()
terms to obtain frequency information (TermsEnum.docFreq()
), PostingsEnum
or
PostingsEnum
for the current term (TermsEnum.postings(org.apache.lucene.index.PostingsEnum)
.Represents returned result from
TermsEnum.seekCeil(org.apache.lucene.util.BytesRef)
.Encapsulates all required internal state to position the associated
TermsEnum
without
re-seeking.Contains statistics for a specific term
Holder for per-term statistics.
This attribute is requested by TermsHashPerField to index the contents.
API for reading term vectors.
Controls the format of term vectors
Codec API for reading term vectors:
Codec API for writing term vectors:
For every document,
TermVectorsWriter.startDocument(int)
is called, informing the Codec how many
fields will be written.Computes a triangular mesh tessellation for a given polygon.
Implementation of this interface will receive calls with internal data at each step of the
triangulation algorithm.
Circular Doubly-linked list used for polygon coordinates
Triangle in the tessellated mesh
A set of static methods returning accessors for internal, package-private functionality in
Lucene.
A field that is indexed and tokenized, without term vectors.
Implementation of
Similarity
with the Vector Space Model.Thrown by lucene on detecting that Thread.interrupt() had been called.
Merges segments of approximately equal size, subject to an allowed number of segments per tier.
Holds score and explanation for a single candidate merge.
The
TimeLimitingCollector
is used to timeout search requests that take longer than the
maximum allowed search time limit.Thrown when elapsed search time exceeds allowed search time.
Thread used to timeout search requests.
A TokenFilter is a TokenStream whose input is another TokenStream.
Abstract parent class for analysis factories that create
TokenFilter
instances.A Tokenizer is a TokenStream whose input is a Reader.
Abstract parent class for analysis factories that create
Tokenizer
instances.Consumes a TokenStream and creates an
Automaton
where the transition labels are UTF8
bytes (or Unicode code points if unicodeArcs is true) from the TermToBytesRefAttribute
.This exception is thrown when determinizing an automaton would require too much work.
Represents hits returned by
IndexSearcher.search(Query,int)
.A base class for all collectors that return a
TopDocs
output.Represents hits returned by
IndexSearcher.search(Query,int,Sort)
.TopKnnCollector is a specific KnnCollector.
Scorable leaf collector
Base rewrite method for collecting only the top terms via a priority queue.
Helper methods to ease implementing
Object.toString()
.Just counts the total number of hits.
Collector manager based on
TotalHitCountCollector
that allows users to parallelize
counting the number of hits, expected to be used mostly wrapped in MultiCollectorManager
.Description of the total number of hits of a query.
How the
TotalHits.value
should be interpreted.A delegating Directory that records which files were written to and deleted.
Holds one transition from an
Automaton
.An interface for implementations that support 2-phase commit.
A utility for executing 2-phase commit on several objects.
Thrown by
TwoPhaseCommitTool.execute(TwoPhaseCommit...)
when an object fails to
commit().Thrown by
TwoPhaseCommitTool.execute(TwoPhaseCommit...)
when an object fails to
prepareCommit().Returned by
Scorer.twoPhaseIterator()
to expose an approximation of a DocIdSetIterator
.A Token's lexical type.
Default implementation of
TypeAttribute
.Class to encode java's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as
String.getBytes(StandardCharsets.UTF_8) does.
Holds a codepoint along with the number of bytes required to represent it in UTF8
An object with this interface is a wrapper around another object (e.g., a filter with a
delegate).
This
MergePolicy
is used for upgrading all existing segments of an index when calling
IndexWriter.forceMerge(int)
.A
QueryCachingPolicy
that tracks usage statistics of recently-used filters in order to
decide on which filters are worth caching.Converts UTF-32 automata to the equivalent UTF-8 representation.
Static helper methods.
Represents a path in TopNSearcher.
Holds a single input (IntsRef) + output, returned by
shortestPaths()
.Utility class to find top N shortest paths from start point(s).
Holds the results for a top N search using
Util.TopNSearcher
The numeric datatype of the vector values.
A provider of vectorization implementations.
Vector similarity function; used in search to return top K most similar vectors to a target
vector.
Utilities for computations with numeric arrays, especially algebraic operations like vector dot
products.
Interface for implementations of VectorUtil support.
Deprecated.
use
FloatVectorValues
insteadA
LockFactory
that wraps another LockFactory
and verifies that each lock
obtain/release is "correct" (never results in two processes holding the lock at the same time).Use by certain classes to match version compatibility across releases of Lucene.
A utility for keeping backwards compatibility on previously abstract methods (or similar
replacements).
Implements a combination of
WeakHashMap
and IdentityHashMap
.Expert: Calculate query weights and build query scorers.
Just wraps a Scorer and performs top scoring using it.
Implements the wildcard search query.
Loader for text files that represent a list of stopwords.
Represents a circle on the XY plane.
An per-document location field.
XYGeometry query for
XYDocValuesField
.reusable cartesian geometry encoding methods
Cartesian Geometry object.
Represents a line in cartesian space.
Represents a point on the earth's surface.
An indexed XY position field.
Represents a polygon in cartesian space.
Represents a x/y cartesian rectangle.
A cartesian shape utility class for indexing and searching geometries whose vertices are unitless
x, y values.
A concrete implementation of
ShapeDocValues
for storing binary doc value representation
of XYShape
geometries in a XYShapeDocValuesField
Concrete implementation of a
ShapeDocValuesField
for cartesian geometries.
IndexSearcher.TooManyClauses