public final class IndexWriterConfig extends LiveIndexWriterConfig
IndexWriter.
Once IndexWriter has been created with this object, changes to this
object will not affect the IndexWriter instance. For that, use
LiveIndexWriterConfig that is returned from IndexWriter.getConfig().
All setter methods return IndexWriterConfig to allow chaining
settings conveniently, for example:
IndexWriterConfig conf = new IndexWriterConfig(analyzer); conf.setter1().setter2();
IndexWriter.getConfig()| Modifier and Type | Class and Description |
|---|---|
static class |
IndexWriterConfig.OpenMode
Specifies the open mode for
IndexWriter. |
| Modifier and Type | Field and Description |
|---|---|
static boolean |
DEFAULT_COMMIT_ON_CLOSE
Default value for whether calls to
IndexWriter.close() include a commit. |
static int |
DEFAULT_MAX_BUFFERED_DELETE_TERMS
Disabled by default (because IndexWriter flushes by RAM usage by default).
|
static int |
DEFAULT_MAX_BUFFERED_DOCS
Disabled by default (because IndexWriter flushes by RAM usage by default).
|
static double |
DEFAULT_RAM_BUFFER_SIZE_MB
Default value is 16 MB (which means flush when buffered docs consume
approximately 16 MB RAM).
|
static int |
DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
Default value is 1945.
|
static boolean |
DEFAULT_READER_POOLING
Default setting (true) for
setReaderPooling(boolean). |
static boolean |
DEFAULT_USE_COMPOUND_FILE_SYSTEM
Default value for compound file system for newly written segments
(set to
true). |
static int |
DISABLE_AUTO_FLUSH
Denotes a flush trigger is disabled.
|
checkPendingFlushOnUpdate, codec, commit, commitOnClose, createdVersionMajor, delPolicy, flushPolicy, indexerThreadPool, indexingChain, indexSort, indexSortFields, infoStream, mergePolicy, mergeScheduler, openMode, perThreadHardLimitMB, readerAttributes, readerPooling, similarity, softDeletesField, useCompoundFile| Constructor and Description |
|---|
IndexWriterConfig()
Creates a new config, using
StandardAnalyzer as the
analyzer. |
IndexWriterConfig(Analyzer analyzer)
Creates a new config that with the provided
Analyzer. |
| Modifier and Type | Method and Description |
|---|---|
Analyzer |
getAnalyzer()
Returns the default analyzer to use for indexing documents.
|
Codec |
getCodec()
Returns the current
Codec. |
IndexCommit |
getIndexCommit()
Returns the
IndexCommit as specified in
setIndexCommit(IndexCommit) or the default,
null which specifies to open the latest index commit point. |
IndexDeletionPolicy |
getIndexDeletionPolicy()
Returns the
IndexDeletionPolicy specified in
setIndexDeletionPolicy(IndexDeletionPolicy) or
the default KeepOnlyLastCommitDeletionPolicy/ |
InfoStream |
getInfoStream()
Returns
InfoStream used for debugging. |
int |
getMaxBufferedDocs()
Returns the number of buffered added documents that will trigger a flush if
enabled.
|
IndexWriter.IndexReaderWarmer |
getMergedSegmentWarmer()
Returns the current merged segment warmer.
|
MergePolicy |
getMergePolicy()
Returns the current MergePolicy in use by this writer.
|
MergeScheduler |
getMergeScheduler()
Returns the
MergeScheduler that was set by
setMergeScheduler(MergeScheduler). |
IndexWriterConfig.OpenMode |
getOpenMode()
Returns the
IndexWriterConfig.OpenMode set by setOpenMode(OpenMode). |
double |
getRAMBufferSizeMB()
Returns the value set by
LiveIndexWriterConfig.setRAMBufferSizeMB(double) if enabled. |
int |
getRAMPerThreadHardLimitMB()
Returns the max amount of memory each
DocumentsWriterPerThread can
consume until forcefully flushed. |
boolean |
getReaderPooling()
Returns
true if IndexWriter should pool readers even if
DirectoryReader.open(IndexWriter) has not been called. |
Similarity |
getSimilarity()
Expert: returns the
Similarity implementation used by this
IndexWriter. |
IndexWriterConfig |
setCheckPendingFlushUpdate(boolean checkPendingFlushOnUpdate)
Expert: sets if indexing threads check for pending flushes on update in order
to help our flushing indexing buffers to disk.
|
IndexWriterConfig |
setCodec(Codec codec)
Set the
Codec. |
IndexWriterConfig |
setCommitOnClose(boolean commitOnClose)
Sets if calls
IndexWriter.close() should first commit
before closing. |
IndexWriterConfig |
setIndexCommit(IndexCommit commit)
Expert: allows to open a certain commit point.
|
IndexWriterConfig |
setIndexCreatedVersionMajor(int indexCreatedVersionMajor)
Expert: set the compatibility version to use for this index.
|
IndexWriterConfig |
setIndexDeletionPolicy(IndexDeletionPolicy delPolicy)
Expert: allows an optional
IndexDeletionPolicy implementation to be
specified. |
IndexWriterConfig |
setIndexSort(Sort sort)
Set the
Sort order to use for all (flushed and merged) segments. |
IndexWriterConfig |
setInfoStream(InfoStream infoStream)
Information about merges, deletes and a
message when maxFieldLength is reached will be printed
to this.
|
IndexWriterConfig |
setInfoStream(PrintStream printStream)
Convenience method that uses
PrintStreamInfoStream. |
IndexWriterConfig |
setMaxBufferedDocs(int maxBufferedDocs)
Determines the minimal number of documents required before the buffered
in-memory documents are flushed as a new Segment.
|
IndexWriterConfig |
setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer)
Set the merged segment warmer.
|
IndexWriterConfig |
setMergePolicy(MergePolicy mergePolicy)
Expert:
MergePolicy is invoked whenever there are changes to the
segments in the index. |
IndexWriterConfig |
setMergeScheduler(MergeScheduler mergeScheduler)
Expert: sets the merge scheduler used by this writer.
|
IndexWriterConfig |
setOpenMode(IndexWriterConfig.OpenMode openMode)
Specifies
IndexWriterConfig.OpenMode of the index. |
IndexWriterConfig |
setRAMBufferSizeMB(double ramBufferSizeMB)
Determines the amount of RAM that may be used for buffering added documents
and deletions before they are flushed to the Directory.
|
IndexWriterConfig |
setRAMPerThreadHardLimitMB(int perThreadHardLimitMB)
Expert: Sets the maximum memory consumption per thread triggering a forced
flush if exceeded.
|
IndexWriterConfig |
setReaderAttributes(Map<String,String> readerAttributes)
Sets the reader attributes used for all readers pulled from the IndexWriter.
|
IndexWriterConfig |
setReaderPooling(boolean readerPooling)
By default, IndexWriter does not pool the
SegmentReaders it must open for deletions and
merging, unless a near-real-time reader has been
obtained by calling
DirectoryReader.open(IndexWriter). |
IndexWriterConfig |
setSimilarity(Similarity similarity)
Expert: set the
Similarity implementation used by this IndexWriter. |
IndexWriterConfig |
setSoftDeletesField(String softDeletesField)
Sets the soft deletes field.
|
IndexWriterConfig |
setUseCompoundFile(boolean useCompoundFile)
Sets if the
IndexWriter should pack newly written segments in a
compound file. |
String |
toString() |
getCommitOnClose, getIndexCreatedVersionMajor, getIndexSort, getIndexSortFields, getReaderAttributes, getSoftDeletesField, getUseCompoundFile, isCheckPendingFlushOnUpdatepublic static final int DISABLE_AUTO_FLUSH
public static final int DEFAULT_MAX_BUFFERED_DELETE_TERMS
public static final int DEFAULT_MAX_BUFFERED_DOCS
public static final double DEFAULT_RAM_BUFFER_SIZE_MB
public static final boolean DEFAULT_READER_POOLING
setReaderPooling(boolean).public static final int DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
setRAMPerThreadHardLimitMB(int)public static final boolean DEFAULT_USE_COMPOUND_FILE_SYSTEM
true). For batch indexing with very large
ram buffers use falsepublic static final boolean DEFAULT_COMMIT_ON_CLOSE
IndexWriter.close() include a commit.public IndexWriterConfig()
StandardAnalyzer as the
analyzer. By default, TieredMergePolicy is used
for merging;
Note that TieredMergePolicy is free to select
non-contiguous merges, which means docIDs may not
remain monotonic over time. If this is a problem you
should switch to LogByteSizeMergePolicy or
LogDocMergePolicy.public IndexWriterConfig(Analyzer analyzer)
Analyzer. By default, TieredMergePolicy is used
for merging;
Note that TieredMergePolicy is free to select
non-contiguous merges, which means docIDs may not
remain monotonic over time. If this is a problem you
should switch to LogByteSizeMergePolicy or
LogDocMergePolicy.public IndexWriterConfig setOpenMode(IndexWriterConfig.OpenMode openMode)
IndexWriterConfig.OpenMode of the index.
Only takes effect when IndexWriter is first created.
public IndexWriterConfig.OpenMode getOpenMode()
LiveIndexWriterConfigIndexWriterConfig.OpenMode set by setOpenMode(OpenMode).getOpenMode in class LiveIndexWriterConfigpublic IndexWriterConfig setIndexCreatedVersionMajor(int indexCreatedVersionMajor)
IndexWriter.addIndexes(org.apache.lucene.store.Directory...) only accepts indices
that have been written with the same major version as the current index.
If the index already exists, then this value is ignored.
Default value is the major of the
latest version.
NOTE: Changing the creation version reduces backward compatibility guarantees. For instance an index created with Lucene 8 with a compatibility version of 7 can't be read with Lucene 9 due to the fact that Lucene only supports reading indices created with the current or previous major release.
indexCreatedVersionMajor - the major version to use for compatibilitypublic IndexWriterConfig setIndexDeletionPolicy(IndexDeletionPolicy delPolicy)
IndexDeletionPolicy implementation to be
specified. You can use this to control when prior commits are deleted from
the index. The default policy is KeepOnlyLastCommitDeletionPolicy
which removes all prior commits as soon as a new commit is done (this
matches behavior before 2.2). Creating your own policy can allow you to
explicitly keep previous "point in time" commits alive in the index for
some time, to allow readers to refresh to the new commit without having the
old commit deleted out from under them. This is necessary on filesystems
like NFS that do not support "delete on last close" semantics, which
Lucene's "point in time" search normally relies on.
NOTE: the deletion policy must not be null.
Only takes effect when IndexWriter is first created.
public IndexDeletionPolicy getIndexDeletionPolicy()
LiveIndexWriterConfigIndexDeletionPolicy specified in
setIndexDeletionPolicy(IndexDeletionPolicy) or
the default KeepOnlyLastCommitDeletionPolicy/getIndexDeletionPolicy in class LiveIndexWriterConfigpublic IndexWriterConfig setIndexCommit(IndexCommit commit)
IndexWriter
from a near-real-time reader, if you pass the reader's
DirectoryReader.getIndexCommit().
Only takes effect when IndexWriter is first created.
public IndexCommit getIndexCommit()
LiveIndexWriterConfigIndexCommit as specified in
setIndexCommit(IndexCommit) or the default,
null which specifies to open the latest index commit point.getIndexCommit in class LiveIndexWriterConfigpublic IndexWriterConfig setSimilarity(Similarity similarity)
Similarity implementation used by this IndexWriter.
NOTE: the similarity must not be null.
Only takes effect when IndexWriter is first created.
public Similarity getSimilarity()
LiveIndexWriterConfigSimilarity implementation used by this
IndexWriter.getSimilarity in class LiveIndexWriterConfigpublic IndexWriterConfig setMergeScheduler(MergeScheduler mergeScheduler)
ConcurrentMergeScheduler.
NOTE: the merge scheduler must not be null.
Only takes effect when IndexWriter is first created.
public MergeScheduler getMergeScheduler()
LiveIndexWriterConfigMergeScheduler that was set by
setMergeScheduler(MergeScheduler).getMergeScheduler in class LiveIndexWriterConfigpublic IndexWriterConfig setCodec(Codec codec)
Codec.
Only takes effect when IndexWriter is first created.
public Codec getCodec()
LiveIndexWriterConfigCodec.getCodec in class LiveIndexWriterConfigpublic MergePolicy getMergePolicy()
LiveIndexWriterConfiggetMergePolicy in class LiveIndexWriterConfigsetMergePolicy(MergePolicy)public IndexWriterConfig setReaderPooling(boolean readerPooling)
DirectoryReader.open(IndexWriter).
This method lets you enable pooling without getting a
near-real-time reader. NOTE: if you set this to
false, IndexWriter will still pool readers once
DirectoryReader.open(IndexWriter) is called.
Only takes effect when IndexWriter is first created.
public boolean getReaderPooling()
LiveIndexWriterConfigtrue if IndexWriter should pool readers even if
DirectoryReader.open(IndexWriter) has not been called.getReaderPooling in class LiveIndexWriterConfigpublic IndexWriterConfig setRAMPerThreadHardLimitMB(int perThreadHardLimitMB)
DocumentsWriterPerThread is forcefully flushed
once it exceeds this limit even if the getRAMBufferSizeMB() has
not been exceeded. This is a safety limit to prevent a
DocumentsWriterPerThread from address space exhaustion due to its
internal 32 bit signed integer based memory addressing.
The given value must be less that 2GB (2048MB)DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MBpublic int getRAMPerThreadHardLimitMB()
LiveIndexWriterConfigDocumentsWriterPerThread can
consume until forcefully flushed.getRAMPerThreadHardLimitMB in class LiveIndexWriterConfigsetRAMPerThreadHardLimitMB(int)public InfoStream getInfoStream()
LiveIndexWriterConfigInfoStream used for debugging.getInfoStream in class LiveIndexWriterConfigsetInfoStream(InfoStream)public Analyzer getAnalyzer()
LiveIndexWriterConfiggetAnalyzer in class LiveIndexWriterConfigpublic int getMaxBufferedDocs()
LiveIndexWriterConfiggetMaxBufferedDocs in class LiveIndexWriterConfigLiveIndexWriterConfig.setMaxBufferedDocs(int)public IndexWriter.IndexReaderWarmer getMergedSegmentWarmer()
LiveIndexWriterConfigIndexWriter.IndexReaderWarmer.getMergedSegmentWarmer in class LiveIndexWriterConfigpublic double getRAMBufferSizeMB()
LiveIndexWriterConfigLiveIndexWriterConfig.setRAMBufferSizeMB(double) if enabled.getRAMBufferSizeMB in class LiveIndexWriterConfigpublic IndexWriterConfig setInfoStream(InfoStream infoStream)
InfoStream.NO_OUTPUT
may be used to suppress output.public IndexWriterConfig setInfoStream(PrintStream printStream)
PrintStreamInfoStream. Must not be null.public IndexWriterConfig setMergePolicy(MergePolicy mergePolicy)
LiveIndexWriterConfigMergePolicy is invoked whenever there are changes to the
segments in the index. Its role is to select which merges to do, if any,
and return a MergePolicy.MergeSpecification describing the merges.
It also selects merges to do for forceMerge.
Takes effect on subsequent merge selections. Any merges in flight or any
merges already registered by the previous MergePolicy are not
affected.
setMergePolicy in class LiveIndexWriterConfigpublic IndexWriterConfig setMaxBufferedDocs(int maxBufferedDocs)
LiveIndexWriterConfig
When this is set, the writer will flush every maxBufferedDocs added
documents. Pass in DISABLE_AUTO_FLUSH to prevent
triggering a flush due to number of buffered documents. Note that if
flushing by RAM usage is also enabled, then the flush will be triggered by
whichever comes first.
Disabled by default (writer flushes by RAM usage).
Takes effect immediately, but only the next time a document is added, updated or deleted.
setMaxBufferedDocs in class LiveIndexWriterConfigLiveIndexWriterConfig.setRAMBufferSizeMB(double)public IndexWriterConfig setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer)
LiveIndexWriterConfigIndexWriter.IndexReaderWarmer.
Takes effect on the next merge.
setMergedSegmentWarmer in class LiveIndexWriterConfigpublic IndexWriterConfig setRAMBufferSizeMB(double ramBufferSizeMB)
LiveIndexWriterConfig
When this is set, the writer will flush whenever buffered documents and
deletions use this much RAM. Pass in
DISABLE_AUTO_FLUSH to prevent triggering a flush
due to RAM usage. Note that if flushing by document count is also enabled,
then the flush will be triggered by whichever comes first.
The maximum RAM limit is inherently determined by the JVMs available
memory. Yet, an IndexWriter session can consume a significantly
larger amount of memory than the given RAM limit since this limit is just
an indicator when to flush memory resident documents to the Directory.
Flushes are likely happen concurrently while other threads adding documents
to the writer. For application stability the available memory in the JVM
should be significantly larger than the RAM buffer used for indexing.
NOTE: the account of RAM usage for pending deletions is only approximate. Specifically, if you delete by Query, Lucene currently has no way to measure the RAM usage of individual Queries so the accounting will under-estimate and you should compensate by either calling commit() or refresh() periodically yourself.
NOTE: It's not guaranteed that all memory resident documents are
flushed once this limit is exceeded. Depending on the configured
FlushPolicy only a subset of the buffered documents are flushed and
therefore only parts of the RAM buffer is released.
The default value is DEFAULT_RAM_BUFFER_SIZE_MB.
Takes effect immediately, but only the next time a document is added, updated or deleted.
setRAMBufferSizeMB in class LiveIndexWriterConfigsetRAMPerThreadHardLimitMB(int)public IndexWriterConfig setUseCompoundFile(boolean useCompoundFile)
LiveIndexWriterConfigIndexWriter should pack newly written segments in a
compound file. Default is true.
Use false for batch indexing with very large ram buffer
settings.
Note: To control compound file usage during segment merges see
MergePolicy.setNoCFSRatio(double) and
MergePolicy.setMaxCFSSegmentSizeMB(double). This setting only
applies to newly created segments.
setUseCompoundFile in class LiveIndexWriterConfigpublic IndexWriterConfig setCommitOnClose(boolean commitOnClose)
IndexWriter.close() should first commit
before closing. Use true to match behavior of Lucene 4.x.public IndexWriterConfig setIndexSort(Sort sort)
Sort order to use for all (flushed and merged) segments.public String toString()
toString in class LiveIndexWriterConfigpublic IndexWriterConfig setCheckPendingFlushUpdate(boolean checkPendingFlushOnUpdate)
LiveIndexWriterConfigDirectoryReader.openIfChanged(DirectoryReader, IndexWriter) or IndexWriter.flush() will
be the only thread writing segments to disk unless flushes are falling behind. If indexing is stalled
due to too many pending flushes indexing threads will help our writing pending segment flushes to disk.setCheckPendingFlushUpdate in class LiveIndexWriterConfigpublic IndexWriterConfig setSoftDeletesField(String softDeletesField)
IndexWriter.deleteDocuments(Term...).
Merges will reclaim soft-deleted as well as hard-deleted documents and index readers obtained from the IndexWriter
will reflect all deleted documents in it's live docs. If soft-deletes are used documents must be indexed via
IndexWriter.softUpdateDocument(Term, Iterable, Field...). Deletes are applied via
IndexWriter.updateDocValues(Term, Field...).
Soft deletes allow to retain documents across merges if the merge policy modifies the live docs of a merge reader.
SoftDeletesRetentionMergePolicy for instance allows to specify an arbitrary query to mark all documents
that should survive the merge. This can be used to for example keep all document modifications for a certain time
interval or the last N operations if some kind of sequence ID is available in the index.
Currently there is no API support to un-delete a soft-deleted document. In oder to un-delete the document must be
re-indexed using IndexWriter.softUpdateDocument(Term, Iterable, Field...).
The default value for this is null which disables soft-deletes. If soft-deletes are enabled documents
can still be hard-deleted. Hard-deleted documents will won't considered as soft-deleted even if they have
a value in the soft-deletes field.public IndexWriterConfig setReaderAttributes(Map<String,String> readerAttributes)
Copyright © 2000-2020 Apache Software Foundation. All Rights Reserved.