public final class IndexWriterConfig extends Object implements Cloneable
IndexWriter
. You
should instantiate this class, call the setters to set
your configuration, then pass it to IndexWriter
.
Note that IndexWriter
makes a private clone; if
you need to subsequently change settings use IndexWriter.getConfig()
.
All setter methods return IndexWriterConfig
to allow chaining
settings conveniently, for example:
IndexWriterConfig conf = new IndexWriterConfig(analyzer); conf.setter1().setter2();
Modifier and Type | Class and Description |
---|---|
static class |
IndexWriterConfig.OpenMode
Specifies the open mode for
IndexWriter :
IndexWriterConfig.OpenMode.CREATE - creates a new index or overwrites an existing one. |
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MAX_BUFFERED_DELETE_TERMS
Disabled by default (because IndexWriter flushes by RAM usage by default).
|
static int |
DEFAULT_MAX_BUFFERED_DOCS
Disabled by default (because IndexWriter flushes by RAM usage by default).
|
static int |
DEFAULT_MAX_THREAD_STATES
The maximum number of simultaneous threads that may be
indexing documents at once in IndexWriter; if more
than this many threads arrive they will wait for
others to finish.
|
static double |
DEFAULT_RAM_BUFFER_SIZE_MB
Default value is 16 MB (which means flush when buffered docs consume
approximately 16 MB RAM).
|
static boolean |
DEFAULT_READER_POOLING
Default setting for
setReaderPooling(boolean) . |
static int |
DEFAULT_READER_TERMS_INDEX_DIVISOR
Default value is 1.
|
static int |
DEFAULT_TERM_INDEX_INTERVAL
Default value is 128.
|
static int |
DISABLE_AUTO_FLUSH
Denotes a flush trigger is disabled.
|
static long |
WRITE_LOCK_TIMEOUT
Default value for the write lock timeout (1,000 ms).
|
Constructor and Description |
---|
IndexWriterConfig(Version matchVersion,
Analyzer analyzer)
|
Modifier and Type | Method and Description |
---|---|
Object |
clone() |
Analyzer |
getAnalyzer()
Returns the default analyzer to use for indexing documents.
|
static long |
getDefaultWriteLockTimeout()
Returns the default write lock timeout for newly instantiated
IndexWriterConfigs.
|
IndexCommit |
getIndexCommit()
Returns the
IndexCommit as specified in
setIndexCommit(IndexCommit) or the default, null
which specifies to open the latest index commit point. |
IndexDeletionPolicy |
getIndexDeletionPolicy()
Returns the
IndexDeletionPolicy specified in
setIndexDeletionPolicy(IndexDeletionPolicy) or the default
KeepOnlyLastCommitDeletionPolicy / |
int |
getMaxBufferedDeleteTerms()
Returns the number of buffered deleted terms that will trigger a flush if
enabled.
|
int |
getMaxBufferedDocs()
Returns the number of buffered added documents that will trigger a flush if
enabled.
|
int |
getMaxThreadStates()
Returns the max number of simultaneous threads that
may be indexing documents at once in IndexWriter.
|
IndexWriter.IndexReaderWarmer |
getMergedSegmentWarmer()
Returns the current merged segment warmer.
|
MergePolicy |
getMergePolicy()
Returns the current MergePolicy in use by this writer.
|
MergeScheduler |
getMergeScheduler()
Returns the
MergeScheduler that was set by
setMergeScheduler(MergeScheduler) |
IndexWriterConfig.OpenMode |
getOpenMode()
Returns the
IndexWriterConfig.OpenMode set by setOpenMode(OpenMode) . |
double |
getRAMBufferSizeMB()
Returns the value set by
setRAMBufferSizeMB(double) if enabled. |
boolean |
getReaderPooling()
Returns true if IndexWriter should pool readers even
if
IndexWriter.getReader() has not been called. |
int |
getReaderTermsIndexDivisor() |
Similarity |
getSimilarity()
Expert: returns the
Similarity implementation used by this
IndexWriter. |
int |
getTermIndexInterval()
Returns the interval between indexed terms.
|
long |
getWriteLockTimeout()
Returns allowed timeout when acquiring the write lock.
|
static void |
setDefaultWriteLockTimeout(long writeLockTimeout)
Sets the default (for any instance) maximum time to wait for a write lock
(in milliseconds).
|
IndexWriterConfig |
setIndexCommit(IndexCommit commit)
Expert: allows to open a certain commit point.
|
IndexWriterConfig |
setIndexDeletionPolicy(IndexDeletionPolicy delPolicy)
Expert: allows an optional
IndexDeletionPolicy implementation to be
specified. |
IndexWriterConfig |
setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
Determines the minimal number of delete terms required before the buffered
in-memory delete terms are applied and flushed.
|
IndexWriterConfig |
setMaxBufferedDocs(int maxBufferedDocs)
Determines the minimal number of documents required before the buffered
in-memory documents are flushed as a new Segment.
|
IndexWriterConfig |
setMaxThreadStates(int maxThreadStates)
Sets the max number of simultaneous threads that may be indexing documents
at once in IndexWriter.
|
IndexWriterConfig |
setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer)
Set the merged segment warmer.
|
IndexWriterConfig |
setMergePolicy(MergePolicy mergePolicy)
Expert:
MergePolicy is invoked whenever there are changes to the
segments in the index. |
IndexWriterConfig |
setMergeScheduler(MergeScheduler mergeScheduler)
Expert: sets the merge scheduler used by this writer.
|
IndexWriterConfig |
setOpenMode(IndexWriterConfig.OpenMode openMode)
Specifies
IndexWriterConfig.OpenMode of the index. |
IndexWriterConfig |
setRAMBufferSizeMB(double ramBufferSizeMB)
Determines the amount of RAM that may be used for buffering added documents
and deletions before they are flushed to the Directory.
|
IndexWriterConfig |
setReaderPooling(boolean readerPooling)
By default, IndexWriter does not pool the
SegmentReaders it must open for deletions and
merging, unless a near-real-time reader has been
obtained by calling
IndexWriter.getReader() . |
IndexWriterConfig |
setReaderTermsIndexDivisor(int divisor)
Sets the termsIndexDivisor passed to any readers that
IndexWriter opens, for example when applying deletes
or creating a near-real-time reader in
IndexWriter.getReader() . |
IndexWriterConfig |
setSimilarity(Similarity similarity)
Expert: set the
Similarity implementation used by this IndexWriter. |
IndexWriterConfig |
setTermIndexInterval(int interval)
Expert: set the interval between indexed terms.
|
IndexWriterConfig |
setWriteLockTimeout(long writeLockTimeout)
Sets the maximum time to wait for a write lock (in milliseconds) for this
instance.
|
String |
toString() |
public static final int DEFAULT_TERM_INDEX_INTERVAL
setTermIndexInterval(int)
.public static final int DISABLE_AUTO_FLUSH
public static final int DEFAULT_MAX_BUFFERED_DELETE_TERMS
public static final int DEFAULT_MAX_BUFFERED_DOCS
public static final double DEFAULT_RAM_BUFFER_SIZE_MB
public static long WRITE_LOCK_TIMEOUT
setDefaultWriteLockTimeout(long)
public static final int DEFAULT_MAX_THREAD_STATES
public static final boolean DEFAULT_READER_POOLING
setReaderPooling(boolean)
.public static final int DEFAULT_READER_TERMS_INDEX_DIVISOR
setReaderTermsIndexDivisor(int)
.public IndexWriterConfig(Version matchVersion, Analyzer analyzer)
Version
as well as the default Analyzer
. If matchVersion is >= Version.LUCENE_32
, TieredMergePolicy
is used
for merging; else LogByteSizeMergePolicy
.
Note that TieredMergePolicy
is free to select
non-contiguous merges, which means docIDs may not
remain montonic over time. If this is a problem you
should switch to LogByteSizeMergePolicy
or
LogDocMergePolicy
.public static void setDefaultWriteLockTimeout(long writeLockTimeout)
public static long getDefaultWriteLockTimeout()
setDefaultWriteLockTimeout(long)
public Analyzer getAnalyzer()
public IndexWriterConfig setOpenMode(IndexWriterConfig.OpenMode openMode)
IndexWriterConfig.OpenMode
of the index.
Only takes effect when IndexWriter is first created.
public IndexWriterConfig.OpenMode getOpenMode()
IndexWriterConfig.OpenMode
set by setOpenMode(OpenMode)
.public IndexWriterConfig setIndexDeletionPolicy(IndexDeletionPolicy delPolicy)
IndexDeletionPolicy
implementation to be
specified. You can use this to control when prior commits are deleted from
the index. The default policy is KeepOnlyLastCommitDeletionPolicy
which removes all prior commits as soon as a new commit is done (this
matches behavior before 2.2). Creating your own policy can allow you to
explicitly keep previous "point in time" commits alive in the index for
some time, to allow readers to refresh to the new commit without having the
old commit deleted out from under them. This is necessary on filesystems
like NFS that do not support "delete on last close" semantics, which
Lucene's "point in time" search normally relies on.
NOTE: the deletion policy cannot be null. If null
is
passed, the deletion policy will be set to the default.
Only takes effect when IndexWriter is first created.
public IndexDeletionPolicy getIndexDeletionPolicy()
IndexDeletionPolicy
specified in
setIndexDeletionPolicy(IndexDeletionPolicy)
or the default
KeepOnlyLastCommitDeletionPolicy
/public IndexWriterConfig setIndexCommit(IndexCommit commit)
Only takes effect when IndexWriter is first created.
public IndexCommit getIndexCommit()
IndexCommit
as specified in
setIndexCommit(IndexCommit)
or the default, null
which specifies to open the latest index commit point.public IndexWriterConfig setSimilarity(Similarity similarity)
Similarity
implementation used by this IndexWriter.
NOTE: the similarity cannot be null. If null
is passed,
the similarity will be set to the default.
public Similarity getSimilarity()
Similarity
implementation used by this
IndexWriter. This defaults to the current value of
Similarity.getDefault()
.public IndexWriterConfig setTermIndexInterval(int interval)
This parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost.
In particular, numUniqueTerms/interval
terms are read into
memory by an IndexReader, and, on average, interval/2
terms
must be scanned for each random term access.
public int getTermIndexInterval()
setTermIndexInterval(int)
public IndexWriterConfig setMergeScheduler(MergeScheduler mergeScheduler)
ConcurrentMergeScheduler
.
NOTE: the merge scheduler cannot be null. If null
is
passed, the merge scheduler will be set to the default.
Only takes effect when IndexWriter is first created.
public MergeScheduler getMergeScheduler()
MergeScheduler
that was set by
setMergeScheduler(MergeScheduler)
public IndexWriterConfig setWriteLockTimeout(long writeLockTimeout)
setDefaultWriteLockTimeout(long)
.
Only takes effect when IndexWriter is first created.
public long getWriteLockTimeout()
setWriteLockTimeout(long)
public IndexWriterConfig setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
Disabled by default (writer flushes by RAM usage).
IllegalArgumentException
- if maxBufferedDeleteTerms
is enabled but smaller than 1Takes effect immediately, but only the next time a
document is added, updated or deleted.
public int getMaxBufferedDeleteTerms()
setMaxBufferedDeleteTerms(int)
public IndexWriterConfig setRAMBufferSizeMB(double ramBufferSizeMB)
When this is set, the writer will flush whenever buffered documents and
deletions use this much RAM. Pass in DISABLE_AUTO_FLUSH
to prevent
triggering a flush due to RAM usage. Note that if flushing by document
count is also enabled, then the flush will be triggered by whichever comes
first.
NOTE: the account of RAM usage for pending deletions is only
approximate. Specifically, if you delete by Query, Lucene currently has no
way to measure the RAM usage of individual Queries so the accounting will
under-estimate and you should compensate by either calling commit()
periodically yourself, or by using setMaxBufferedDeleteTerms(int)
to flush by count instead of RAM usage (each buffered delete Query counts
as one).
NOTE: because IndexWriter uses int
s when managing its
internal storage, the absolute maximum value for this setting is somewhat
less than 2048 MB. The precise limit depends on various factors, such as
how large your documents are, how many fields have norms, etc., so it's
best to set this value comfortably under 2048.
The default value is DEFAULT_RAM_BUFFER_SIZE_MB
.
Takes effect immediately, but only the next time a document is added, updated or deleted.
IllegalArgumentException
- if ramBufferSize is enabled but non-positive, or it disables
ramBufferSize when maxBufferedDocs is already disabledpublic double getRAMBufferSizeMB()
setRAMBufferSizeMB(double)
if enabled.public IndexWriterConfig setMaxBufferedDocs(int maxBufferedDocs)
When this is set, the writer will flush every maxBufferedDocs added
documents. Pass in DISABLE_AUTO_FLUSH
to prevent triggering a
flush due to number of buffered documents. Note that if flushing by RAM
usage is also enabled, then the flush will be triggered by whichever comes
first.
Disabled by default (writer flushes by RAM usage).
Takes effect immediately, but only the next time a document is added, updated or deleted.
IllegalArgumentException
- if maxBufferedDocs is enabled but smaller than 2, or it disables
maxBufferedDocs when ramBufferSize is already disabledsetRAMBufferSizeMB(double)
public int getMaxBufferedDocs()
setMaxBufferedDocs(int)
public IndexWriterConfig setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer)
IndexWriter.IndexReaderWarmer
.
Takes effect on the next merge.
public IndexWriter.IndexReaderWarmer getMergedSegmentWarmer()
IndexWriter.IndexReaderWarmer
.public IndexWriterConfig setMergePolicy(MergePolicy mergePolicy)
MergePolicy
is invoked whenever there are changes to the
segments in the index. Its role is to select which merges to do, if any,
and return a MergePolicy.MergeSpecification
describing the merges.
It also selects merges to do for forceMerge. (The default is
LogByteSizeMergePolicy
.
Only takes effect when IndexWriter is first created.
public MergePolicy getMergePolicy()
setMergePolicy(MergePolicy)
public IndexWriterConfig setMaxThreadStates(int maxThreadStates)
maxThreadStates
will be set to
DEFAULT_MAX_THREAD_STATES
.
Only takes effect when IndexWriter is first created.
public int getMaxThreadStates()
public IndexWriterConfig setReaderPooling(boolean readerPooling)
IndexWriter.getReader()
.
This method lets you enable pooling without getting a
near-real-time reader. NOTE: if you set this to
false, IndexWriter will still pool readers once
IndexWriter.getReader()
is called.
Only takes effect when IndexWriter is first created.
public boolean getReaderPooling()
IndexWriter.getReader()
has not been called.public IndexWriterConfig setReaderTermsIndexDivisor(int divisor)
IndexWriter.getReader()
. If you pass -1, the terms index
won't be loaded by the readers. This is only useful in
advanced situations when you will only .next() through
all terms; attempts to seek will hit an exception.
Takes effect immediately, but only applies to readers opened after this call
public int getReaderTermsIndexDivisor()
setReaderTermsIndexDivisor(int)