Class LiveIndexWriterConfig

  • Direct Known Subclasses:
    IndexWriterConfig

    public class LiveIndexWriterConfig
    extends Object
    Holds all the configuration used by IndexWriter with few setters for settings that can be changed on an IndexWriter instance "live".
    Since:
    4.0
    • Field Detail

      • createdVersionMajor

        protected int createdVersionMajor
        Compatibility version to use for this index.
      • indexingChain

        protected volatile org.apache.lucene.index.DocumentsWriterPerThread.IndexingChain indexingChain
        DocumentsWriterPerThread.IndexingChain that determines how documents are indexed.
      • codec

        protected volatile Codec codec
        Codec used to write new segments.
      • indexerThreadPool

        protected volatile org.apache.lucene.index.DocumentsWriterPerThreadPool indexerThreadPool
        DocumentsWriterPerThreadPool to control how threads are allocated to DocumentsWriterPerThread.
      • readerPooling

        protected volatile boolean readerPooling
        True if readers should be pooled.
      • flushPolicy

        protected volatile org.apache.lucene.index.FlushPolicy flushPolicy
        FlushPolicy to control when segments are flushed.
      • perThreadHardLimitMB

        protected volatile int perThreadHardLimitMB
        Sets the hard upper bound on RAM usage for a single segment, after which the segment is forced to flush.
      • useCompoundFile

        protected volatile boolean useCompoundFile
        True if segment flushes should use compound file format
      • commitOnClose

        protected boolean commitOnClose
        True if calls to IndexWriter.close() should first do a commit.
      • indexSort

        protected Sort indexSort
        The sort order to use to write merged segments.
      • indexSortFields

        protected Set<String> indexSortFields
        The field names involved in the index sort
      • checkPendingFlushOnUpdate

        protected volatile boolean checkPendingFlushOnUpdate
        if an indexing thread should check for pending flushes on update in order to help out on a full flush
      • softDeletesField

        protected String softDeletesField
        soft deletes field
      • readerAttributes

        protected Map<String,​String> readerAttributes
        the attributes for the NRT readers
    • Method Detail

      • getAnalyzer

        public Analyzer getAnalyzer()
        Returns the default analyzer to use for indexing documents.
      • setRAMBufferSizeMB

        public LiveIndexWriterConfig setRAMBufferSizeMB​(double ramBufferSizeMB)
        Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.

        When this is set, the writer will flush whenever buffered documents and deletions use this much RAM. Pass in IndexWriterConfig.DISABLE_AUTO_FLUSH to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.

        The maximum RAM limit is inherently determined by the JVMs available memory. Yet, an IndexWriter session can consume a significantly larger amount of memory than the given RAM limit since this limit is just an indicator when to flush memory resident documents to the Directory. Flushes are likely happen concurrently while other threads adding documents to the writer. For application stability the available memory in the JVM should be significantly larger than the RAM buffer used for indexing.

        NOTE: the account of RAM usage for pending deletions is only approximate. Specifically, if you delete by Query, Lucene currently has no way to measure the RAM usage of individual Queries so the accounting will under-estimate and you should compensate by either calling commit() or refresh() periodically yourself.

        NOTE: It's not guaranteed that all memory resident documents are flushed once this limit is exceeded. Depending on the configured FlushPolicy only a subset of the buffered documents are flushed and therefore only parts of the RAM buffer is released.

        The default value is IndexWriterConfig.DEFAULT_RAM_BUFFER_SIZE_MB.

        Takes effect immediately, but only the next time a document is added, updated or deleted.

        Throws:
        IllegalArgumentException - if ramBufferSize is enabled but non-positive, or it disables ramBufferSize when maxBufferedDocs is already disabled
        See Also:
        IndexWriterConfig.setRAMPerThreadHardLimitMB(int)
      • setMaxBufferedDocs

        public LiveIndexWriterConfig setMaxBufferedDocs​(int maxBufferedDocs)
        Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally give faster indexing.

        When this is set, the writer will flush every maxBufferedDocs added documents. Pass in IndexWriterConfig.DISABLE_AUTO_FLUSH to prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first.

        Disabled by default (writer flushes by RAM usage).

        Takes effect immediately, but only the next time a document is added, updated or deleted.

        Throws:
        IllegalArgumentException - if maxBufferedDocs is enabled but smaller than 2, or it disables maxBufferedDocs when ramBufferSize is already disabled
        See Also:
        setRAMBufferSizeMB(double)
      • getMaxBufferedDocs

        public int getMaxBufferedDocs()
        Returns the number of buffered added documents that will trigger a flush if enabled.
        See Also:
        setMaxBufferedDocs(int)
      • setMergePolicy

        public LiveIndexWriterConfig setMergePolicy​(MergePolicy mergePolicy)
        Expert: MergePolicy is invoked whenever there are changes to the segments in the index. Its role is to select which merges to do, if any, and return a MergePolicy.MergeSpecification describing the merges. It also selects merges to do for forceMerge.

        Takes effect on subsequent merge selections. Any merges in flight or any merges already registered by the previous MergePolicy are not affected.

      • getCodec

        public Codec getCodec()
        Returns the current Codec.
      • getUseCompoundFile

        public boolean getUseCompoundFile()
        Returns true iff the IndexWriter packs newly written segments in a compound file. Default is true.
      • getCommitOnClose

        public boolean getCommitOnClose()
        Returns true if IndexWriter.close() should first commit before closing.
      • getIndexSort

        public Sort getIndexSort()
        Get the index-time Sort order, applied to all (flushed and merged) segments.
      • getIndexSortFields

        public Set<String> getIndexSortFields()
        Returns the field names involved in the index sort
      • isCheckPendingFlushOnUpdate

        public boolean isCheckPendingFlushOnUpdate()
        Expert: Returns if indexing threads check for pending flushes on update in order to help our flushing indexing buffers to disk
        WARNING: This API is experimental and might change in incompatible ways in the next release.
      • setCheckPendingFlushUpdate

        public LiveIndexWriterConfig setCheckPendingFlushUpdate​(boolean checkPendingFlushOnUpdate)
        Expert: sets if indexing threads check for pending flushes on update in order to help our flushing indexing buffers to disk. As a consequence, threads calling DirectoryReader.openIfChanged(DirectoryReader, IndexWriter) or IndexWriter.flush() will be the only thread writing segments to disk unless flushes are falling behind. If indexing is stalled due to too many pending flushes indexing threads will help our writing pending segment flushes to disk.
        WARNING: This API is experimental and might change in incompatible ways in the next release.
      • getReaderAttributes

        public Map<String,​String> getReaderAttributes()
        Returns the reader attributes passed to all published readers opened on or within the IndexWriter