Apache Lucene Migration Guide

Lucene 3.x index format no longer supported

Lucene 5 no longer supports the Lucene 3.x index format. Opening indexes will result in IndexFormatTooOldException. It is recommended to either reindex all your data, or upgrade the old indexes with the IndexUpgrader tool of latest Lucene 4 version (4.10.x). Those indexes can then be read (see next section) with Lucene 5.

Support for previous Lucene 4.x index formats moved to new module

Lucene 5 will by default only read indexes created with Lucene 5. To read and upgrade Lucene 4.x indexes, you must add the lucene-backward-codecs.jar to the classpath. It is recommended to upgrade the old indexes with the IndexUpgrader tool, so you can remove the backward-codecs module from classpath. This will also improve performance.

All file handling APIs changed to Java 7 NIO.2 (LUCENE-5945)

All APIs around Directory and other file-based resources were changed to make use of the new Java 7 NIO.2 API. It is no longer possible to pass java.io.File onames to FSDirectory classes. FSDirectory classes now requires java.nio.file.Path instances. This allows to place index directories also on "virtual file systems" like ZIP or TAR files. To migrate existing code use java.io.File#toPath().

In addition, make sure that custom directory implementations throw the new IOException types, because Lucene cannot understand the old legacy IOExceptions (like java.io.FileNotFoundException) instead of the new ones like java.nio.file.NoSuchFileException.

Directory and LockFactory APIs restructured (LUCENE-5953)

Locking is now under the responsibility of the Directory implementation. LockFactory is only used by subclasses of BaseDirectory to delegate locking to an impl class. LockFactories are responsible to create a Lock on behalf of a BaseDirectory subclass.

The following changes in existing code need to be done:

Removed Reader from Tokenizer constructor (LUCENE-5388)

The constructor of Tokenizer no longer takes Reader, as this was a leftover from before it was reusable. See the org.apache.lucene.analysis package documentation for more details.

Refactored Collector API (LUCENE-5299)

The Collector API has been refactored to use a different Collector instance per segment. It is possible to migrate existing collectors painlessly by extending SimpleCollector instead of Collector: SimpleCollector is a specialization of Collector that returns itself as a per-segment Collector.

Refactored FieldComparator API (LUCENE-5702)

Like collectors (see above), field comparators have been refactored to produce a new comparator (called LeafFieldComparator) per segment. It is possible to migrate existing comparators painlessly by extending SimpleFieldComparator, which will implements both FieldComparator and LeafFieldComparator and return itself as a per-segment comparator.

Removed ChainedFilter (LUCENE-5984)

Users are advised to switch to BooleanFilter instead.

Removed OpenBitSet (LUCENE-6010)

OpenBitSet only differs from LongBitSet by its ability to grow automatically. In case growth is required, it would need to be managed externally.

FunctionValues.exist() Behavior Changes due to ValueSource bug fixes (LUCENE-5961)

Bugs fixed in several ValueSource functions may result in different behavior in situations where some documents do not have values for fields wrapped in other ValueSources. Users who want to preserve the previous behavior may need to wrap their ValueSources in a "DefFunction" along with a ConstValueSource of "0.0".

PayloadAttributeImpl.clone() (LUCENE-6055)

PayloadAttributeImpl.clone() did a shallow clone which was incorrect, and was fixed to do a deep clone. If you require shallow cloning of the underlying bytes, you should override PayloadAttributeImpl.clone() to do a shallow clone instead.

Removed out-of-order scoring (LUCENE-6179)

Bulk scorers must now always collect documents in order. If you have custom collectors, the acceptsDocsOutOfOrder method has been removed and collectors can safely assume that they will be collected in order.

Renamed "Atomic" to "Leaf" for segment readers (LUCENE-5569)

AtomicReader and AtomicReaderContext are now called LeafReader and LeafReaderContext, respectively.

Removed custom Analyzer per-document indexing APIs from IndexWriter (LUCENE-6212)

These methods were removed because they are dangerous since they let you analyze each document arbitrarily differently, making it difficult to properly analyze text at query time and easy to accidentally "lose" search hits. Instead, you should break out text into separate fields and use a different analyzer for each field with PerFieldAnalyzerWrapper.