org.apache.lucene.index
Class IndexWriterConfig

java.lang.Object
  extended by org.apache.lucene.index.LiveIndexWriterConfig
      extended by org.apache.lucene.index.IndexWriterConfig
All Implemented Interfaces:
Cloneable

public final class IndexWriterConfig
extends LiveIndexWriterConfig
implements Cloneable

Holds all the configuration that is used to create an IndexWriter. Once IndexWriter has been created with this object, changes to this object will not affect the IndexWriter instance. For that, use LiveIndexWriterConfig that is returned from IndexWriter.getConfig().

All setter methods return IndexWriterConfig to allow chaining settings conveniently, for example:

 IndexWriterConfig conf = new IndexWriterConfig(analyzer);
 conf.setter1().setter2();
 

Since:
3.1
See Also:
IndexWriter.getConfig()

Nested Class Summary
static class IndexWriterConfig.OpenMode
          Specifies the open mode for IndexWriter.
 
Field Summary
static int DEFAULT_MAX_BUFFERED_DELETE_TERMS
          Disabled by default (because IndexWriter flushes by RAM usage by default).
static int DEFAULT_MAX_BUFFERED_DOCS
          Disabled by default (because IndexWriter flushes by RAM usage by default).
static int DEFAULT_MAX_THREAD_STATES
          The maximum number of simultaneous threads that may be indexing documents at once in IndexWriter; if more than this many threads arrive they will wait for others to finish.
static double DEFAULT_RAM_BUFFER_SIZE_MB
          Default value is 16 MB (which means flush when buffered docs consume approximately 16 MB RAM).
static int DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
          Default value is 1945.
static boolean DEFAULT_READER_POOLING
          Default setting for setReaderPooling(boolean).
static int DEFAULT_READER_TERMS_INDEX_DIVISOR
          Default value is 1.
static int DEFAULT_TERM_INDEX_INTERVAL
          Default value is 32.
static boolean DEFAULT_USE_COMPOUND_FILE_SYSTEM
          Default value for compound file system for newly written segments (set to true).
static int DISABLE_AUTO_FLUSH
          Denotes a flush trigger is disabled.
static long WRITE_LOCK_TIMEOUT
          Default value for the write lock timeout (1,000 ms).
 
Fields inherited from class org.apache.lucene.index.LiveIndexWriterConfig
codec, commit, delPolicy, flushPolicy, indexerThreadPool, indexingChain, infoStream, matchVersion, mergePolicy, mergeScheduler, openMode, perThreadHardLimitMB, readerPooling, similarity, useCompoundFile, writeLockTimeout
 
Constructor Summary
IndexWriterConfig(Version matchVersion, Analyzer analyzer)
          Creates a new config that with defaults that match the specified Version as well as the default Analyzer.
 
Method Summary
 IndexWriterConfig clone()
           
 Analyzer getAnalyzer()
          Returns the default analyzer to use for indexing documents.
 Codec getCodec()
          Returns the current Codec.
static long getDefaultWriteLockTimeout()
          Returns the default write lock timeout for newly instantiated IndexWriterConfigs.
 IndexCommit getIndexCommit()
          Returns the IndexCommit as specified in setIndexCommit(IndexCommit) or the default, null which specifies to open the latest index commit point.
 IndexDeletionPolicy getIndexDeletionPolicy()
          Returns the IndexDeletionPolicy specified in setIndexDeletionPolicy(IndexDeletionPolicy) or the default KeepOnlyLastCommitDeletionPolicy/
 InfoStream getInfoStream()
          Returns InfoStream used for debugging.
 int getMaxBufferedDeleteTerms()
          Returns the number of buffered deleted terms that will trigger a flush of all buffered deletes if enabled.
 int getMaxBufferedDocs()
          Returns the number of buffered added documents that will trigger a flush if enabled.
 int getMaxThreadStates()
          Returns the max number of simultaneous threads that may be indexing documents at once in IndexWriter.
 IndexWriter.IndexReaderWarmer getMergedSegmentWarmer()
          Returns the current merged segment warmer.
 MergePolicy getMergePolicy()
          Returns the current MergePolicy in use by this writer.
 MergeScheduler getMergeScheduler()
          Returns the MergeScheduler that was set by setMergeScheduler(MergeScheduler).
 IndexWriterConfig.OpenMode getOpenMode()
          Returns the IndexWriterConfig.OpenMode set by setOpenMode(OpenMode).
 double getRAMBufferSizeMB()
          Returns the value set by LiveIndexWriterConfig.setRAMBufferSizeMB(double) if enabled.
 int getRAMPerThreadHardLimitMB()
          Returns the max amount of memory each DocumentsWriterPerThread can consume until forcefully flushed.
 boolean getReaderPooling()
          Returns true if IndexWriter should pool readers even if DirectoryReader.open(IndexWriter, boolean) has not been called.
 int getReaderTermsIndexDivisor()
          Returns the termInfosIndexDivisor.
 Similarity getSimilarity()
          Expert: returns the Similarity implementation used by this IndexWriter.
 int getTermIndexInterval()
          Returns the interval between indexed terms.
 long getWriteLockTimeout()
          Returns allowed timeout when acquiring the write lock.
 IndexWriterConfig setCodec(Codec codec)
          Set the Codec.
static void setDefaultWriteLockTimeout(long writeLockTimeout)
          Sets the default (for any instance) maximum time to wait for a write lock (in milliseconds).
 IndexWriterConfig setIndexCommit(IndexCommit commit)
          Expert: allows to open a certain commit point.
 IndexWriterConfig setIndexDeletionPolicy(IndexDeletionPolicy delPolicy)
          Expert: allows an optional IndexDeletionPolicy implementation to be specified.
 IndexWriterConfig setInfoStream(InfoStream infoStream)
          Information about merges, deletes and a message when maxFieldLength is reached will be printed to this.
 IndexWriterConfig setInfoStream(PrintStream printStream)
          Convenience method that uses PrintStreamInfoStream.
 IndexWriterConfig setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
          Determines the maximum number of delete-by-term operations that will be buffered before both the buffered in-memory delete terms and queries are applied and flushed.
 IndexWriterConfig setMaxBufferedDocs(int maxBufferedDocs)
          Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment.
 IndexWriterConfig setMaxThreadStates(int maxThreadStates)
          Sets the max number of simultaneous threads that may be indexing documents at once in IndexWriter.
 IndexWriterConfig setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer)
          Set the merged segment warmer.
 IndexWriterConfig setMergePolicy(MergePolicy mergePolicy)
          Expert: MergePolicy is invoked whenever there are changes to the segments in the index.
 IndexWriterConfig setMergeScheduler(MergeScheduler mergeScheduler)
          Expert: sets the merge scheduler used by this writer.
 IndexWriterConfig setOpenMode(IndexWriterConfig.OpenMode openMode)
          Specifies IndexWriterConfig.OpenMode of the index.
 IndexWriterConfig setRAMBufferSizeMB(double ramBufferSizeMB)
          Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory.
 IndexWriterConfig setRAMPerThreadHardLimitMB(int perThreadHardLimitMB)
          Expert: Sets the maximum memory consumption per thread triggering a forced flush if exceeded.
 IndexWriterConfig setReaderPooling(boolean readerPooling)
          By default, IndexWriter does not pool the SegmentReaders it must open for deletions and merging, unless a near-real-time reader has been obtained by calling DirectoryReader.open(IndexWriter, boolean).
 IndexWriterConfig setReaderTermsIndexDivisor(int divisor)
          Sets the termsIndexDivisor passed to any readers that IndexWriter opens, for example when applying deletes or creating a near-real-time reader in DirectoryReader.open(IndexWriter, boolean).
 IndexWriterConfig setSimilarity(Similarity similarity)
          Expert: set the Similarity implementation used by this IndexWriter.
 IndexWriterConfig setTermIndexInterval(int interval)
          Expert: set the interval between indexed terms.
 IndexWriterConfig setUseCompoundFile(boolean useCompoundFile)
          Sets if the IndexWriter should pack newly written segments in a compound file.
 IndexWriterConfig setWriteLockTimeout(long writeLockTimeout)
          Sets the maximum time to wait for a write lock (in milliseconds) for this instance.
 String toString()
           
 
Methods inherited from class org.apache.lucene.index.LiveIndexWriterConfig
getUseCompoundFile
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_TERM_INDEX_INTERVAL

public static final int DEFAULT_TERM_INDEX_INTERVAL
Default value is 32. Change using setTermIndexInterval(int).

See Also:
Constant Field Values

DISABLE_AUTO_FLUSH

public static final int DISABLE_AUTO_FLUSH
Denotes a flush trigger is disabled.

See Also:
Constant Field Values

DEFAULT_MAX_BUFFERED_DELETE_TERMS

public static final int DEFAULT_MAX_BUFFERED_DELETE_TERMS
Disabled by default (because IndexWriter flushes by RAM usage by default).

See Also:
Constant Field Values

DEFAULT_MAX_BUFFERED_DOCS

public static final int DEFAULT_MAX_BUFFERED_DOCS
Disabled by default (because IndexWriter flushes by RAM usage by default).

See Also:
Constant Field Values

DEFAULT_RAM_BUFFER_SIZE_MB

public static final double DEFAULT_RAM_BUFFER_SIZE_MB
Default value is 16 MB (which means flush when buffered docs consume approximately 16 MB RAM).

See Also:
Constant Field Values

WRITE_LOCK_TIMEOUT

public static long WRITE_LOCK_TIMEOUT
Default value for the write lock timeout (1,000 ms).

See Also:
setDefaultWriteLockTimeout(long)

DEFAULT_READER_POOLING

public static final boolean DEFAULT_READER_POOLING
Default setting for setReaderPooling(boolean).

See Also:
Constant Field Values

DEFAULT_READER_TERMS_INDEX_DIVISOR

public static final int DEFAULT_READER_TERMS_INDEX_DIVISOR
Default value is 1. Change using setReaderTermsIndexDivisor(int).

See Also:
Constant Field Values

DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB

public static final int DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
Default value is 1945. Change using setRAMPerThreadHardLimitMB(int)

See Also:
Constant Field Values

DEFAULT_MAX_THREAD_STATES

public static final int DEFAULT_MAX_THREAD_STATES
The maximum number of simultaneous threads that may be indexing documents at once in IndexWriter; if more than this many threads arrive they will wait for others to finish. Default value is 8.

See Also:
Constant Field Values

DEFAULT_USE_COMPOUND_FILE_SYSTEM

public static final boolean DEFAULT_USE_COMPOUND_FILE_SYSTEM
Default value for compound file system for newly written segments (set to true). For batch indexing with very large ram buffers use false

See Also:
Constant Field Values
Constructor Detail

IndexWriterConfig

public IndexWriterConfig(Version matchVersion,
                         Analyzer analyzer)
Creates a new config that with defaults that match the specified Version as well as the default Analyzer. If matchVersion is >= Version.LUCENE_32, TieredMergePolicy is used for merging; else LogByteSizeMergePolicy. Note that TieredMergePolicy is free to select non-contiguous merges, which means docIDs may not remain monotonic over time. If this is a problem you should switch to LogByteSizeMergePolicy or LogDocMergePolicy.

Method Detail

setDefaultWriteLockTimeout

public static void setDefaultWriteLockTimeout(long writeLockTimeout)
Sets the default (for any instance) maximum time to wait for a write lock (in milliseconds).


getDefaultWriteLockTimeout

public static long getDefaultWriteLockTimeout()
Returns the default write lock timeout for newly instantiated IndexWriterConfigs.

See Also:
setDefaultWriteLockTimeout(long)

clone

public IndexWriterConfig clone()
Overrides:
clone in class Object

setOpenMode

public IndexWriterConfig setOpenMode(IndexWriterConfig.OpenMode openMode)
Specifies IndexWriterConfig.OpenMode of the index.

Only takes effect when IndexWriter is first created.


getOpenMode

public IndexWriterConfig.OpenMode getOpenMode()
Description copied from class: LiveIndexWriterConfig
Returns the IndexWriterConfig.OpenMode set by setOpenMode(OpenMode).

Overrides:
getOpenMode in class LiveIndexWriterConfig

setIndexDeletionPolicy

public IndexWriterConfig setIndexDeletionPolicy(IndexDeletionPolicy delPolicy)
Expert: allows an optional IndexDeletionPolicy implementation to be specified. You can use this to control when prior commits are deleted from the index. The default policy is KeepOnlyLastCommitDeletionPolicy which removes all prior commits as soon as a new commit is done (this matches behavior before 2.2). Creating your own policy can allow you to explicitly keep previous "point in time" commits alive in the index for some time, to allow readers to refresh to the new commit without having the old commit deleted out from under them. This is necessary on filesystems like NFS that do not support "delete on last close" semantics, which Lucene's "point in time" search normally relies on.

NOTE: the deletion policy cannot be null.

Only takes effect when IndexWriter is first created.


getIndexDeletionPolicy

public IndexDeletionPolicy getIndexDeletionPolicy()
Description copied from class: LiveIndexWriterConfig
Returns the IndexDeletionPolicy specified in setIndexDeletionPolicy(IndexDeletionPolicy) or the default KeepOnlyLastCommitDeletionPolicy/

Overrides:
getIndexDeletionPolicy in class LiveIndexWriterConfig

setIndexCommit

public IndexWriterConfig setIndexCommit(IndexCommit commit)
Expert: allows to open a certain commit point. The default is null which opens the latest commit point.

Only takes effect when IndexWriter is first created.


getIndexCommit

public IndexCommit getIndexCommit()
Description copied from class: LiveIndexWriterConfig
Returns the IndexCommit as specified in setIndexCommit(IndexCommit) or the default, null which specifies to open the latest index commit point.

Overrides:
getIndexCommit in class LiveIndexWriterConfig

setSimilarity

public IndexWriterConfig setSimilarity(Similarity similarity)
Expert: set the Similarity implementation used by this IndexWriter.

NOTE: the similarity cannot be null.

Only takes effect when IndexWriter is first created.


getSimilarity

public Similarity getSimilarity()
Description copied from class: LiveIndexWriterConfig
Expert: returns the Similarity implementation used by this IndexWriter.

Overrides:
getSimilarity in class LiveIndexWriterConfig

setMergeScheduler

public IndexWriterConfig setMergeScheduler(MergeScheduler mergeScheduler)
Expert: sets the merge scheduler used by this writer. The default is ConcurrentMergeScheduler.

NOTE: the merge scheduler cannot be null.

Only takes effect when IndexWriter is first created.


getMergeScheduler

public MergeScheduler getMergeScheduler()
Description copied from class: LiveIndexWriterConfig
Returns the MergeScheduler that was set by setMergeScheduler(MergeScheduler).

Overrides:
getMergeScheduler in class LiveIndexWriterConfig

setWriteLockTimeout

public IndexWriterConfig setWriteLockTimeout(long writeLockTimeout)
Sets the maximum time to wait for a write lock (in milliseconds) for this instance. You can change the default value for all instances by calling setDefaultWriteLockTimeout(long).

Only takes effect when IndexWriter is first created.


getWriteLockTimeout

public long getWriteLockTimeout()
Description copied from class: LiveIndexWriterConfig
Returns allowed timeout when acquiring the write lock.

Overrides:
getWriteLockTimeout in class LiveIndexWriterConfig
See Also:
setWriteLockTimeout(long)

setMergePolicy

public IndexWriterConfig setMergePolicy(MergePolicy mergePolicy)
Expert: MergePolicy is invoked whenever there are changes to the segments in the index. Its role is to select which merges to do, if any, and return a MergePolicy.MergeSpecification describing the merges. It also selects merges to do for forceMerge.

Only takes effect when IndexWriter is first created.


setCodec

public IndexWriterConfig setCodec(Codec codec)
Set the Codec.

Only takes effect when IndexWriter is first created.


getCodec

public Codec getCodec()
Description copied from class: LiveIndexWriterConfig
Returns the current Codec.

Overrides:
getCodec in class LiveIndexWriterConfig

getMergePolicy

public MergePolicy getMergePolicy()
Description copied from class: LiveIndexWriterConfig
Returns the current MergePolicy in use by this writer.

Overrides:
getMergePolicy in class LiveIndexWriterConfig
See Also:
setMergePolicy(MergePolicy)

setMaxThreadStates

public IndexWriterConfig setMaxThreadStates(int maxThreadStates)
Sets the max number of simultaneous threads that may be indexing documents at once in IndexWriter. Values < 1 are invalid and if passed maxThreadStates will be set to DEFAULT_MAX_THREAD_STATES.

Only takes effect when IndexWriter is first created.


getMaxThreadStates

public int getMaxThreadStates()
Description copied from class: LiveIndexWriterConfig
Returns the max number of simultaneous threads that may be indexing documents at once in IndexWriter.

Overrides:
getMaxThreadStates in class LiveIndexWriterConfig

setReaderPooling

public IndexWriterConfig setReaderPooling(boolean readerPooling)
By default, IndexWriter does not pool the SegmentReaders it must open for deletions and merging, unless a near-real-time reader has been obtained by calling DirectoryReader.open(IndexWriter, boolean). This method lets you enable pooling without getting a near-real-time reader. NOTE: if you set this to false, IndexWriter will still pool readers once DirectoryReader.open(IndexWriter, boolean) is called.

Only takes effect when IndexWriter is first created.


getReaderPooling

public boolean getReaderPooling()
Description copied from class: LiveIndexWriterConfig
Returns true if IndexWriter should pool readers even if DirectoryReader.open(IndexWriter, boolean) has not been called.

Overrides:
getReaderPooling in class LiveIndexWriterConfig

setRAMPerThreadHardLimitMB

public IndexWriterConfig setRAMPerThreadHardLimitMB(int perThreadHardLimitMB)
Expert: Sets the maximum memory consumption per thread triggering a forced flush if exceeded. A DocumentsWriterPerThread is forcefully flushed once it exceeds this limit even if the getRAMBufferSizeMB() has not been exceeded. This is a safety limit to prevent a DocumentsWriterPerThread from address space exhaustion due to its internal 32 bit signed integer based memory addressing. The given value must be less that 2GB (2048MB)

See Also:
DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB

getRAMPerThreadHardLimitMB

public int getRAMPerThreadHardLimitMB()
Description copied from class: LiveIndexWriterConfig
Returns the max amount of memory each DocumentsWriterPerThread can consume until forcefully flushed.

Overrides:
getRAMPerThreadHardLimitMB in class LiveIndexWriterConfig
See Also:
setRAMPerThreadHardLimitMB(int)

getInfoStream

public InfoStream getInfoStream()
Description copied from class: LiveIndexWriterConfig
Returns InfoStream used for debugging.

Overrides:
getInfoStream in class LiveIndexWriterConfig
See Also:
setInfoStream(InfoStream)

getAnalyzer

public Analyzer getAnalyzer()
Description copied from class: LiveIndexWriterConfig
Returns the default analyzer to use for indexing documents.

Overrides:
getAnalyzer in class LiveIndexWriterConfig

getMaxBufferedDeleteTerms

public int getMaxBufferedDeleteTerms()
Description copied from class: LiveIndexWriterConfig
Returns the number of buffered deleted terms that will trigger a flush of all buffered deletes if enabled.

Overrides:
getMaxBufferedDeleteTerms in class LiveIndexWriterConfig
See Also:
LiveIndexWriterConfig.setMaxBufferedDeleteTerms(int)

getMaxBufferedDocs

public int getMaxBufferedDocs()
Description copied from class: LiveIndexWriterConfig
Returns the number of buffered added documents that will trigger a flush if enabled.

Overrides:
getMaxBufferedDocs in class LiveIndexWriterConfig
See Also:
LiveIndexWriterConfig.setMaxBufferedDocs(int)

getMergedSegmentWarmer

public IndexWriter.IndexReaderWarmer getMergedSegmentWarmer()
Description copied from class: LiveIndexWriterConfig
Returns the current merged segment warmer. See IndexWriter.IndexReaderWarmer.

Overrides:
getMergedSegmentWarmer in class LiveIndexWriterConfig

getRAMBufferSizeMB

public double getRAMBufferSizeMB()
Description copied from class: LiveIndexWriterConfig
Returns the value set by LiveIndexWriterConfig.setRAMBufferSizeMB(double) if enabled.

Overrides:
getRAMBufferSizeMB in class LiveIndexWriterConfig

getReaderTermsIndexDivisor

public int getReaderTermsIndexDivisor()
Description copied from class: LiveIndexWriterConfig
Returns the termInfosIndexDivisor.

Overrides:
getReaderTermsIndexDivisor in class LiveIndexWriterConfig
See Also:
LiveIndexWriterConfig.setReaderTermsIndexDivisor(int)

getTermIndexInterval

public int getTermIndexInterval()
Description copied from class: LiveIndexWriterConfig
Returns the interval between indexed terms.

Overrides:
getTermIndexInterval in class LiveIndexWriterConfig
See Also:
LiveIndexWriterConfig.setTermIndexInterval(int)

setInfoStream

public IndexWriterConfig setInfoStream(InfoStream infoStream)
Information about merges, deletes and a message when maxFieldLength is reached will be printed to this. Must not be null, but InfoStream.NO_OUTPUT may be used to supress output.


setInfoStream

public IndexWriterConfig setInfoStream(PrintStream printStream)
Convenience method that uses PrintStreamInfoStream. Must not be null.


setMaxBufferedDeleteTerms

public IndexWriterConfig setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
Description copied from class: LiveIndexWriterConfig
Determines the maximum number of delete-by-term operations that will be buffered before both the buffered in-memory delete terms and queries are applied and flushed.

Disabled by default (writer flushes by RAM usage).

NOTE: This setting won't trigger a segment flush.

Takes effect immediately, but only the next time a document is added, updated or deleted. Also, if you only delete-by-query, this setting has no effect, i.e. delete queries are buffered until the next segment is flushed.

Overrides:
setMaxBufferedDeleteTerms in class LiveIndexWriterConfig
See Also:
LiveIndexWriterConfig.setRAMBufferSizeMB(double)

setMaxBufferedDocs

public IndexWriterConfig setMaxBufferedDocs(int maxBufferedDocs)
Description copied from class: LiveIndexWriterConfig
Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally give faster indexing.

When this is set, the writer will flush every maxBufferedDocs added documents. Pass in DISABLE_AUTO_FLUSH to prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first.

Disabled by default (writer flushes by RAM usage).

Takes effect immediately, but only the next time a document is added, updated or deleted.

Overrides:
setMaxBufferedDocs in class LiveIndexWriterConfig
See Also:
LiveIndexWriterConfig.setRAMBufferSizeMB(double)

setMergedSegmentWarmer

public IndexWriterConfig setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer)
Description copied from class: LiveIndexWriterConfig
Set the merged segment warmer. See IndexWriter.IndexReaderWarmer.

Takes effect on the next merge.

Overrides:
setMergedSegmentWarmer in class LiveIndexWriterConfig

setRAMBufferSizeMB

public IndexWriterConfig setRAMBufferSizeMB(double ramBufferSizeMB)
Description copied from class: LiveIndexWriterConfig
Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.

When this is set, the writer will flush whenever buffered documents and deletions use this much RAM. Pass in DISABLE_AUTO_FLUSH to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.

The maximum RAM limit is inherently determined by the JVMs available memory. Yet, an IndexWriter session can consume a significantly larger amount of memory than the given RAM limit since this limit is just an indicator when to flush memory resident documents to the Directory. Flushes are likely happen concurrently while other threads adding documents to the writer. For application stability the available memory in the JVM should be significantly larger than the RAM buffer used for indexing.

NOTE: the account of RAM usage for pending deletions is only approximate. Specifically, if you delete by Query, Lucene currently has no way to measure the RAM usage of individual Queries so the accounting will under-estimate and you should compensate by either calling commit() periodically yourself, or by using LiveIndexWriterConfig.setMaxBufferedDeleteTerms(int) to flush and apply buffered deletes by count instead of RAM usage (for each buffered delete Query a constant number of bytes is used to estimate RAM usage). Note that enabling LiveIndexWriterConfig.setMaxBufferedDeleteTerms(int) will not trigger any segment flushes.

NOTE: It's not guaranteed that all memory resident documents are flushed once this limit is exceeded. Depending on the configured FlushPolicy only a subset of the buffered documents are flushed and therefore only parts of the RAM buffer is released.

The default value is DEFAULT_RAM_BUFFER_SIZE_MB.

Takes effect immediately, but only the next time a document is added, updated or deleted.

Overrides:
setRAMBufferSizeMB in class LiveIndexWriterConfig
See Also:
setRAMPerThreadHardLimitMB(int)

setReaderTermsIndexDivisor

public IndexWriterConfig setReaderTermsIndexDivisor(int divisor)
Description copied from class: LiveIndexWriterConfig
Sets the termsIndexDivisor passed to any readers that IndexWriter opens, for example when applying deletes or creating a near-real-time reader in DirectoryReader.open(IndexWriter, boolean). If you pass -1, the terms index won't be loaded by the readers. This is only useful in advanced situations when you will only .next() through all terms; attempts to seek will hit an exception.

Takes effect immediately, but only applies to readers opened after this call

NOTE: divisor settings > 1 do not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for terms indexes that can efficiently re-sample terms at load time.

Overrides:
setReaderTermsIndexDivisor in class LiveIndexWriterConfig

setTermIndexInterval

public IndexWriterConfig setTermIndexInterval(int interval)
Description copied from class: LiveIndexWriterConfig
Expert: set the interval between indexed terms. Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms.

This parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost.

In particular, numUniqueTerms/interval terms are read into memory by an IndexReader, and, on average, interval/2 terms must be scanned for each random term access.

Takes effect immediately, but only applies to newly flushed/merged segments.

NOTE: This parameter does not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for term indexes that are implemented as a fixed gap between terms. For example, Lucene41PostingsFormat implements the term index instead based upon how terms share prefixes. To configure its parameters (the minimum and maximum size for a block), you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int, int). which can also be configured on a per-field basis:

 //customize Lucene41PostingsFormat, passing minBlockSize=50, maxBlockSize=100
 final PostingsFormat tweakedPostings = new Lucene41PostingsFormat(50, 100);
 iwc.setCodec(new Lucene45Codec() {
   @Override
   public PostingsFormat getPostingsFormatForField(String field) {
     if (field.equals("fieldWithTonsOfTerms"))
       return tweakedPostings;
     else
       return super.getPostingsFormatForField(field);
   }
 });
 
Note that other implementations may have their own parameters, or no parameters at all.

Overrides:
setTermIndexInterval in class LiveIndexWriterConfig
See Also:
DEFAULT_TERM_INDEX_INTERVAL

setUseCompoundFile

public IndexWriterConfig setUseCompoundFile(boolean useCompoundFile)
Description copied from class: LiveIndexWriterConfig
Sets if the IndexWriter should pack newly written segments in a compound file. Default is true.

Use false for batch indexing with very large ram buffer settings.

Note: To control compound file usage during segment merges see MergePolicy.setNoCFSRatio(double) and MergePolicy.setMaxCFSSegmentSizeMB(double). This setting only applies to newly created segments.

Overrides:
setUseCompoundFile in class LiveIndexWriterConfig

toString

public String toString()
Overrides:
toString in class LiveIndexWriterConfig


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.