Apache Lucene Migration Guide

Lucene 3.x index format no longer supported

Lucene 5 no longer supports the Lucene 3.x index format. Opening indexes will result in IndexFormatTooOldException. It is recommended to either reindex all your data, or upgrade the old indexes with the IndexUpgrader tool of latest Lucene 4 version (4.10.x). Those indexes can then be read (see next section) with Lucene 5.

Support for previous Lucene 4.x index formats moved to new module

Lucene 5 will by default only read indexes created with Lucene 5. To read and upgrade Lucene 4.x indexes, you must add the lucene-backward-codecs.jar to the classpath. It is recommended to upgrade the old indexes with the IndexUpgrader tool, so you can remove the backward-codecs module from classpath. This will also improve performance.

All file handling APIs changed to Java 7 NIO.2 (LUCENE-5945)

All APIs around Directory and other file-based resources were changed to make use of the new Java 7 NIO.2 API. It is no longer possible to pass java.io.File onames to FSDirectory classes. FSDirectory classes now requires java.nio.file.Path instances. This allows to place index directories also on "virtual file systems" like ZIP or TAR files. To migrate existing code use java.io.File#toPath().

In addition, make sure that custom directory implementations throw the new IOException types, because Lucene cannot understand the old legacy IOExceptions (like java.io.FileNotFoundException) instead of the new ones like java.nio.file.NoSuchFileException.

Directory and LockFactory APIs restructured (LUCENE-5953)

Locking is now under the responsibility of the Directory implementation. LockFactory is only used by subclasses of BaseDirectory to delegate locking to an impl class. LockFactories are responsible to create a Lock on behalf of a BaseDirectory subclass.

The following changes in existing code need to be done:

LockFactory implementations are singletons now and have no state. They only need to implement one method: makeLock(Directory dir, String name). The passed directory can be used to determine the corect file system path for the lock file or similar, so it knows where to create the lock. In addition, the factory may check with instanceof, if the lock factory can be used with the type of directory at all.
It was never really supported to place lock files outside of the index directory and this functionality was removed. If you still rely on this, you can use the following trick: Use FileSwitchDirectory and delegate the file extension ".lock" to another Directory instance pointing to another path. FileSwitchDirectory also delegates lock files based on the extension.
If you wrap another directory using FilterDirectory, you cannot make use of LockFactories anymore, because only BaseDirectory knows about them. To wrap locking, you must hook into FilterDirectory.makeLock(String name) and wrap the Lock instance returned, as needed. See MockDirectoryWrapper in lucene-test-framework for an example.
It is no longer allowed to pass "null" as LockFactory to FSDirectory implementations. You have to explicitely pass the platform default to the directory (currently always NativeFSLockFactory.INSTANCE, but subject to change!). To get the platform default, call FSLockFactory.getDefault().

Removed Reader from Tokenizer constructor (LUCENE-5388)

The constructor of Tokenizer no longer takes Reader, as this was a leftover from before it was reusable. See the org.apache.lucene.analysis package documentation for more details.

Refactored Collector API (LUCENE-5299)

The Collector API has been refactored to use a different Collector instance per segment. It is possible to migrate existing collectors painlessly by extending SimpleCollector instead of Collector: SimpleCollector is a specialization of Collector that returns itself as a per-segment Collector.

Refactored FieldComparator API (LUCENE-5702)

Like collectors (see above), field comparators have been refactored to produce a new comparator (called LeafFieldComparator) per segment. It is possible to migrate existing comparators painlessly by extending SimpleFieldComparator, which will implements both FieldComparator and LeafFieldComparator and return itself as a per-segment comparator.

Removed ChainedFilter (LUCENE-5984)

Users are advised to switch to BooleanFilter instead.

Removed OpenBitSet (LUCENE-6010)

OpenBitSet only differs from LongBitSet by its ability to grow automatically. In case growth is required, it would need to be managed externally.

FunctionValues.exist() Behavior Changes due to ValueSource bug fixes (LUCENE-5961)

Bugs fixed in several ValueSource functions may result in different behavior in situations where some documents do not have values for fields wrapped in other ValueSources. Users who want to preserve the previous behavior may need to wrap their ValueSources in a "DefFunction" along with a ConstValueSource of "0.0".

PayloadAttributeImpl.clone() (LUCENE-6055)

PayloadAttributeImpl.clone() did a shallow clone which was incorrect, and was fixed to do a deep clone. If you require shallow cloning of the underlying bytes, you should override PayloadAttributeImpl.clone() to do a shallow clone instead.

Removed out-of-order scoring (LUCENE-6179)

Bulk scorers must now always collect documents in order. If you have custom collectors, the acceptsDocsOutOfOrder method has been removed and collectors can safely assume that they will be collected in order.

Renamed "Atomic" to "Leaf" for segment readers (LUCENE-5569)

AtomicReader and AtomicReaderContext are now called LeafReader and LeafReaderContext, respectively.

Removed custom Analyzer per-document indexing APIs from IndexWriter (LUCENE-6212)

These methods were removed because they are dangerous since they let you analyze each document arbitrarily differently, making it difficult to properly analyze text at query time and easy to accidentally "lose" search hits. Instead, you should break out text into separate fields and use a different analyzer for each field with PerFieldAnalyzerWrapper.