IndexWriterConfig (Lucene 3.4.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.index
Class IndexWriterConfig

java.lang.Object
  org.apache.lucene.index.IndexWriterConfig

All Implemented Interfaces:: Cloneable

public final class IndexWriterConfig
extends Object
implements Cloneable
extends Object
implements Cloneable

Holds all the configuration of IndexWriter. You should instantiate this class, call the setters to set your configuration, then pass it to IndexWriter. Note that IndexWriter makes a private clone; if you need to subsequently change settings use IndexWriter.getConfig().

All setter methods return IndexWriterConfig to allow chaining settings conveniently, for example:

 IndexWriterConfig conf = new IndexWriterConfig(analyzer);
 conf.setter1().setter2();

Since:: 3.1

Nested Class Summary
`static class`	`IndexWriterConfig.OpenMode` Specifies the open mode for `IndexWriter`: `IndexWriterConfig.OpenMode.CREATE` - creates a new index or overwrites an existing one.

Field Summary
`static int`	`DEFAULT_MAX_BUFFERED_DELETE_TERMS` Disabled by default (because IndexWriter flushes by RAM usage by default).
`static int`	`DEFAULT_MAX_BUFFERED_DOCS` Disabled by default (because IndexWriter flushes by RAM usage by default).
`static int`	`DEFAULT_MAX_THREAD_STATES` The maximum number of simultaneous threads that may be indexing documents at once in IndexWriter; if more than this many threads arrive they will wait for others to finish.
`static double`	`DEFAULT_RAM_BUFFER_SIZE_MB` Default value is 16 MB (which means flush when buffered docs consume approximately 16 MB RAM).
`static boolean`	`DEFAULT_READER_POOLING` Default setting for `setReaderPooling(boolean)`.
`static int`	`DEFAULT_READER_TERMS_INDEX_DIVISOR` Default value is 1.
`static int`	`DEFAULT_TERM_INDEX_INTERVAL` Default value is 128.
`static int`	`DISABLE_AUTO_FLUSH` Denotes a flush trigger is disabled.
`static long`	`WRITE_LOCK_TIMEOUT` Default value for the write lock timeout (1,000 ms).

Constructor Summary
`IndexWriterConfig(Version matchVersion, Analyzer analyzer)` Creates a new config that with defaults that match the specified `Version` as well as the default `Analyzer`.

Method Summary
`Object`	`clone()`
`Analyzer`	`getAnalyzer()` Returns the default analyzer to use for indexing documents.
`static long`	`getDefaultWriteLockTimeout()` Returns the default write lock timeout for newly instantiated IndexWriterConfigs.
`IndexCommit`	`getIndexCommit()` Returns the `IndexCommit` as specified in `setIndexCommit(IndexCommit)` or the default, `null` which specifies to open the latest index commit point.
`IndexDeletionPolicy`	`getIndexDeletionPolicy()` Returns the `IndexDeletionPolicy` specified in `setIndexDeletionPolicy(IndexDeletionPolicy)` or the default `KeepOnlyLastCommitDeletionPolicy`/
`int`	`getMaxBufferedDeleteTerms()` Returns the number of buffered deleted terms that will trigger a flush if enabled.
`int`	`getMaxBufferedDocs()` Returns the number of buffered added documents that will trigger a flush if enabled.
`int`	`getMaxThreadStates()` Returns the max number of simultaneous threads that may be indexing documents at once in IndexWriter.
`IndexWriter.IndexReaderWarmer`	`getMergedSegmentWarmer()` Returns the current merged segment warmer.
`MergePolicy`	`getMergePolicy()` Returns the current MergePolicy in use by this writer.
`MergeScheduler`	`getMergeScheduler()` Returns the `MergeScheduler` that was set by `setMergeScheduler(MergeScheduler)`
`IndexWriterConfig.OpenMode`	`getOpenMode()` Returns the `IndexWriterConfig.OpenMode` set by `setOpenMode(OpenMode)`.
`double`	`getRAMBufferSizeMB()` Returns the value set by `setRAMBufferSizeMB(double)` if enabled.
`boolean`	`getReaderPooling()` Returns true if IndexWriter should pool readers even if `IndexWriter.getReader()` has not been called.
`int`	`getReaderTermsIndexDivisor()`
`Similarity`	`getSimilarity()` Expert: returns the `Similarity` implementation used by this IndexWriter.
`int`	`getTermIndexInterval()` Returns the interval between indexed terms.
`long`	`getWriteLockTimeout()` Returns allowed timeout when acquiring the write lock.
`static void`	`setDefaultWriteLockTimeout(long writeLockTimeout)` Sets the default (for any instance) maximum time to wait for a write lock (in milliseconds).
`IndexWriterConfig`	`setIndexCommit(IndexCommit commit)` Expert: allows to open a certain commit point.
`IndexWriterConfig`	`setIndexDeletionPolicy(IndexDeletionPolicy delPolicy)` Expert: allows an optional `IndexDeletionPolicy` implementation to be specified.
`IndexWriterConfig`	`setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)` Determines the minimal number of delete terms required before the buffered in-memory delete terms are applied and flushed.
`IndexWriterConfig`	`setMaxBufferedDocs(int maxBufferedDocs)` Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment.
`IndexWriterConfig`	`setMaxThreadStates(int maxThreadStates)` Sets the max number of simultaneous threads that may be indexing documents at once in IndexWriter.
`IndexWriterConfig`	`setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer)` Set the merged segment warmer.
`IndexWriterConfig`	`setMergePolicy(MergePolicy mergePolicy)` Expert: `MergePolicy` is invoked whenever there are changes to the segments in the index.
`IndexWriterConfig`	`setMergeScheduler(MergeScheduler mergeScheduler)` Expert: sets the merge scheduler used by this writer.
`IndexWriterConfig`	`setOpenMode(IndexWriterConfig.OpenMode openMode)` Specifies `IndexWriterConfig.OpenMode` of the index.
`IndexWriterConfig`	`setRAMBufferSizeMB(double ramBufferSizeMB)` Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory.
`IndexWriterConfig`	`setReaderPooling(boolean readerPooling)` By default, IndexWriter does not pool the SegmentReaders it must open for deletions and merging, unless a near-real-time reader has been obtained by calling `IndexWriter.getReader()`.
`IndexWriterConfig`	`setReaderTermsIndexDivisor(int divisor)` Sets the termsIndexDivisor passed to any readers that IndexWriter opens, for example when applying deletes or creating a near-real-time reader in `IndexWriter.getReader()`.
`IndexWriterConfig`	`setSimilarity(Similarity similarity)` Expert: set the `Similarity` implementation used by this IndexWriter.
`IndexWriterConfig`	`setTermIndexInterval(int interval)` Expert: set the interval between indexed terms.
`IndexWriterConfig`	`setWriteLockTimeout(long writeLockTimeout)` Sets the maximum time to wait for a write lock (in milliseconds) for this instance.
`String`	`toString()`

Methods inherited from class java.lang.Object
`equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Field Detail

DEFAULT_TERM_INDEX_INTERVAL

public static final int DEFAULT_TERM_INDEX_INTERVAL

Default value is 128. Change using setTermIndexInterval(int).

See Also:: Constant Field Values

DISABLE_AUTO_FLUSH

public static final int DISABLE_AUTO_FLUSH

Denotes a flush trigger is disabled.

See Also:: Constant Field Values

DEFAULT_MAX_BUFFERED_DELETE_TERMS

public static final int DEFAULT_MAX_BUFFERED_DELETE_TERMS

Disabled by default (because IndexWriter flushes by RAM usage by default).

See Also:: Constant Field Values

DEFAULT_MAX_BUFFERED_DOCS

public static final int DEFAULT_MAX_BUFFERED_DOCS

Disabled by default (because IndexWriter flushes by RAM usage by default).

See Also:: Constant Field Values

DEFAULT_RAM_BUFFER_SIZE_MB

public static final double DEFAULT_RAM_BUFFER_SIZE_MB

Default value is 16 MB (which means flush when buffered docs consume approximately 16 MB RAM).

See Also:: Constant Field Values

WRITE_LOCK_TIMEOUT

public static long WRITE_LOCK_TIMEOUT

Default value for the write lock timeout (1,000 ms).

See Also:: setDefaultWriteLockTimeout(long)

DEFAULT_MAX_THREAD_STATES

public static final int DEFAULT_MAX_THREAD_STATES

The maximum number of simultaneous threads that may be indexing documents at once in IndexWriter; if more than this many threads arrive they will wait for others to finish.

See Also:: Constant Field Values

DEFAULT_READER_POOLING

public static final boolean DEFAULT_READER_POOLING

Default setting for setReaderPooling(boolean).

See Also:: Constant Field Values

DEFAULT_READER_TERMS_INDEX_DIVISOR

public static final int DEFAULT_READER_TERMS_INDEX_DIVISOR

Default value is 1. Change using setReaderTermsIndexDivisor(int).

Constructor Detail

IndexWriterConfig

public IndexWriterConfig(Version matchVersion,
                         Analyzer analyzer)

Creates a new config that with defaults that match the specified Version as well as the default Analyzer. If matchVersion is >= Version.LUCENE_32, TieredMergePolicy is used for merging; else LogByteSizeMergePolicy. Note that TieredMergePolicy is free to select non-contiguous merges, which means docIDs may not remain montonic over time. If this is a problem you should switch to LogByteSizeMergePolicy or LogDocMergePolicy.

Method Detail

setDefaultWriteLockTimeout

public static void setDefaultWriteLockTimeout(long writeLockTimeout)

Sets the default (for any instance) maximum time to wait for a write lock (in milliseconds).

getDefaultWriteLockTimeout

public static long getDefaultWriteLockTimeout()

Returns the default write lock timeout for newly instantiated IndexWriterConfigs.

See Also:: setDefaultWriteLockTimeout(long)

clone

public Object clone()

Overrides:: clone in class Object

getAnalyzer

public Analyzer getAnalyzer()

Returns the default analyzer to use for indexing documents.

setOpenMode

public IndexWriterConfig setOpenMode(IndexWriterConfig.OpenMode openMode)

Specifies IndexWriterConfig.OpenMode of the index.

Only takes effect when IndexWriter is first created.

getOpenMode

public IndexWriterConfig.OpenMode getOpenMode()

Returns the IndexWriterConfig.OpenMode set by setOpenMode(OpenMode).

setIndexDeletionPolicy

public IndexWriterConfig setIndexDeletionPolicy(IndexDeletionPolicy delPolicy)

Expert: allows an optional IndexDeletionPolicy implementation to be specified. You can use this to control when prior commits are deleted from the index. The default policy is KeepOnlyLastCommitDeletionPolicy which removes all prior commits as soon as a new commit is done (this matches behavior before 2.2). Creating your own policy can allow you to explicitly keep previous "point in time" commits alive in the index for some time, to allow readers to refresh to the new commit without having the old commit deleted out from under them. This is necessary on filesystems like NFS that do not support "delete on last close" semantics, which Lucene's "point in time" search normally relies on.

NOTE: the deletion policy cannot be null. If null is passed, the deletion policy will be set to the default.

Only takes effect when IndexWriter is first created.

getIndexDeletionPolicy

public IndexDeletionPolicy getIndexDeletionPolicy()

Returns the IndexDeletionPolicy specified in setIndexDeletionPolicy(IndexDeletionPolicy) or the default KeepOnlyLastCommitDeletionPolicy/

setIndexCommit

public IndexWriterConfig setIndexCommit(IndexCommit commit)

Expert: allows to open a certain commit point. The default is null which opens the latest commit point.

Only takes effect when IndexWriter is first created.

getIndexCommit

public IndexCommit getIndexCommit()

Returns the IndexCommit as specified in setIndexCommit(IndexCommit) or the default, null which specifies to open the latest index commit point.

setSimilarity

public IndexWriterConfig setSimilarity(Similarity similarity)

Expert: set the Similarity implementation used by this IndexWriter.

NOTE: the similarity cannot be null. If null is passed, the similarity will be set to the default.

See Also:: Only takes effect when IndexWriter is first created.

getSimilarity

public Similarity getSimilarity()

Expert: returns the Similarity implementation used by this IndexWriter. This defaults to the current value of Similarity.getDefault().

setTermIndexInterval

public IndexWriterConfig setTermIndexInterval(int interval)

Expert: set the interval between indexed terms. Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms.

This parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost.

In particular, numUniqueTerms/interval terms are read into memory by an IndexReader, and, on average, interval/2 terms must be scanned for each random term access.

See Also:: Takes effect immediately, but only applies to newly flushed/merged segments.

getTermIndexInterval

public int getTermIndexInterval()

Returns the interval between indexed terms.

See Also:: setTermIndexInterval(int)

setMergeScheduler

public IndexWriterConfig setMergeScheduler(MergeScheduler mergeScheduler)

Expert: sets the merge scheduler used by this writer. The default is ConcurrentMergeScheduler.

NOTE: the merge scheduler cannot be null. If null is passed, the merge scheduler will be set to the default.

Only takes effect when IndexWriter is first created.

getMergeScheduler

public MergeScheduler getMergeScheduler()

Returns the MergeScheduler that was set by setMergeScheduler(MergeScheduler)

setWriteLockTimeout

public IndexWriterConfig setWriteLockTimeout(long writeLockTimeout)

Sets the maximum time to wait for a write lock (in milliseconds) for this instance. You can change the default value for all instances by calling setDefaultWriteLockTimeout(long).

Only takes effect when IndexWriter is first created.

getWriteLockTimeout

public long getWriteLockTimeout()

Returns allowed timeout when acquiring the write lock.

See Also:: setWriteLockTimeout(long)

setMaxBufferedDeleteTerms

public IndexWriterConfig setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)

Determines the minimal number of delete terms required before the buffered in-memory delete terms are applied and flushed. If there are documents buffered in memory at the time, they are merged and a new segment is created.

Disabled by default (writer flushes by RAM usage).

Throws:: IllegalArgumentException - if maxBufferedDeleteTerms is enabled but smaller than 1
See Also:: Takes effect immediately, but only the next time a document is added, updated or deleted.

getMaxBufferedDeleteTerms

public int getMaxBufferedDeleteTerms()

Returns the number of buffered deleted terms that will trigger a flush if enabled.

See Also:: setMaxBufferedDeleteTerms(int)

setRAMBufferSizeMB

public IndexWriterConfig setRAMBufferSizeMB(double ramBufferSizeMB)

Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.

When this is set, the writer will flush whenever buffered documents and deletions use this much RAM. Pass in DISABLE_AUTO_FLUSH to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.

NOTE: the account of RAM usage for pending deletions is only approximate. Specifically, if you delete by Query, Lucene currently has no way to measure the RAM usage of individual Queries so the accounting will under-estimate and you should compensate by either calling commit() periodically yourself, or by using setMaxBufferedDeleteTerms(int) to flush by count instead of RAM usage (each buffered delete Query counts as one).

NOTE: because IndexWriter uses ints when managing its internal storage, the absolute maximum value for this setting is somewhat less than 2048 MB. The precise limit depends on various factors, such as how large your documents are, how many fields have norms, etc., so it's best to set this value comfortably under 2048.

The default value is DEFAULT_RAM_BUFFER_SIZE_MB.

Takes effect immediately, but only the next time a document is added, updated or deleted.

Throws:: IllegalArgumentException - if ramBufferSize is enabled but non-positive, or it disables ramBufferSize when maxBufferedDocs is already disabled

getRAMBufferSizeMB

public double getRAMBufferSizeMB()

Returns the value set by setRAMBufferSizeMB(double) if enabled.

setMaxBufferedDocs

public IndexWriterConfig setMaxBufferedDocs(int maxBufferedDocs)

Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally give faster indexing.

When this is set, the writer will flush every maxBufferedDocs added documents. Pass in DISABLE_AUTO_FLUSH to prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first.

Disabled by default (writer flushes by RAM usage).

Takes effect immediately, but only the next time a document is added, updated or deleted.

Throws:: IllegalArgumentException - if maxBufferedDocs is enabled but smaller than 2, or it disables maxBufferedDocs when ramBufferSize is already disabled
See Also:: setRAMBufferSizeMB(double)

getMaxBufferedDocs

public int getMaxBufferedDocs()

Returns the number of buffered added documents that will trigger a flush if enabled.

See Also:: setMaxBufferedDocs(int)

setMergedSegmentWarmer

public IndexWriterConfig setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer)

Set the merged segment warmer. See IndexWriter.IndexReaderWarmer.

Takes effect on the next merge.

getMergedSegmentWarmer

public IndexWriter.IndexReaderWarmer getMergedSegmentWarmer()

Returns the current merged segment warmer. See IndexWriter.IndexReaderWarmer.

setMergePolicy

public IndexWriterConfig setMergePolicy(MergePolicy mergePolicy)

Expert: MergePolicy is invoked whenever there are changes to the segments in the index. Its role is to select which merges to do, if any, and return a MergePolicy.MergeSpecification describing the merges. It also selects merges to do for optimize(). (The default is LogByteSizeMergePolicy.

Only takes effect when IndexWriter is first created.

getMergePolicy

public MergePolicy getMergePolicy()

Returns the current MergePolicy in use by this writer.

See Also:: setMergePolicy(MergePolicy)

setMaxThreadStates

public IndexWriterConfig setMaxThreadStates(int maxThreadStates)

Sets the max number of simultaneous threads that may be indexing documents at once in IndexWriter. Values < 1 are invalid and if passed maxThreadStates will be set to DEFAULT_MAX_THREAD_STATES.

Only takes effect when IndexWriter is first created.

getMaxThreadStates

public int getMaxThreadStates()

Returns the max number of simultaneous threads that may be indexing documents at once in IndexWriter.

setReaderPooling

public IndexWriterConfig setReaderPooling(boolean readerPooling)

By default, IndexWriter does not pool the SegmentReaders it must open for deletions and merging, unless a near-real-time reader has been obtained by calling IndexWriter.getReader(). This method lets you enable pooling without getting a near-real-time reader. NOTE: if you set this to false, IndexWriter will still pool readers once IndexWriter.getReader() is called.

Only takes effect when IndexWriter is first created.

getReaderPooling

public boolean getReaderPooling()

Returns true if IndexWriter should pool readers even if IndexWriter.getReader() has not been called.

setReaderTermsIndexDivisor

public IndexWriterConfig setReaderTermsIndexDivisor(int divisor)

Sets the termsIndexDivisor passed to any readers that IndexWriter opens, for example when applying deletes or creating a near-real-time reader in IndexWriter.getReader(). If you pass -1, the terms index won't be loaded by the readers. This is only useful in advanced situations when you will only .next() through all terms; attempts to seek will hit an exception.

Takes effect immediately, but only applies to readers opened after this call

getReaderTermsIndexDivisor

public int getReaderTermsIndexDivisor()

See Also:: setReaderTermsIndexDivisor(int)

toString

public String toString()

Overrides:: toString in class Object

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.index Class IndexWriterConfig

DEFAULT_TERM_INDEX_INTERVAL

DISABLE_AUTO_FLUSH

DEFAULT_MAX_BUFFERED_DELETE_TERMS

DEFAULT_MAX_BUFFERED_DOCS

DEFAULT_RAM_BUFFER_SIZE_MB

WRITE_LOCK_TIMEOUT

DEFAULT_MAX_THREAD_STATES

DEFAULT_READER_POOLING

DEFAULT_READER_TERMS_INDEX_DIVISOR

IndexWriterConfig

setDefaultWriteLockTimeout

getDefaultWriteLockTimeout

clone

getAnalyzer

setOpenMode

getOpenMode

setIndexDeletionPolicy

getIndexDeletionPolicy

setIndexCommit

getIndexCommit

setSimilarity

getSimilarity

setTermIndexInterval

getTermIndexInterval

setMergeScheduler

getMergeScheduler

setWriteLockTimeout

getWriteLockTimeout

setMaxBufferedDeleteTerms

getMaxBufferedDeleteTerms

setRAMBufferSizeMB

getRAMBufferSizeMB

setMaxBufferedDocs

getMaxBufferedDocs

setMergedSegmentWarmer

getMergedSegmentWarmer

setMergePolicy

getMergePolicy

setMaxThreadStates

getMaxThreadStates

setReaderPooling

getReaderPooling

setReaderTermsIndexDivisor

getReaderTermsIndexDivisor

toString

org.apache.lucene.index
Class IndexWriterConfig