Lucene 5 no longer supports the Lucene 3.x index format. Opening indexes will result in
IndexFormatTooOldException. It is recommended to either reindex all your data, or upgrade the old indexes with the
IndexUpgrader tool of latest Lucene 4 version (4.10.x). Those indexes can then be read (see next section) with Lucene 5.
Lucene 5 will by default only read indexes created with Lucene 5. To read and upgrade Lucene 4.x indexes, you must add the
lucene-backward-codecs.jar to the classpath. It is recommended to upgrade the old indexes with the
IndexUpgrader tool, so you can remove the backward-codecs module from classpath. This will also improve performance.
All APIs around Directory and other file-based resources were changed to make use of the new Java 7 NIO.2 API. It is no longer possible to pass java.io.File onames to FSDirectory classes. FSDirectory classes now requires java.nio.file.Path instances. This allows to place index directories also on "virtual file systems" like ZIP or TAR files. To migrate existing code use java.io.File#toPath().
In addition, make sure that custom directory implementations throw the new IOException types, because Lucene cannot understand the old legacy IOExceptions (like java.io.FileNotFoundException) instead of the new ones like java.nio.file.NoSuchFileException.
Locking is now under the responsibility of the Directory implementation. LockFactory is only used by subclasses of BaseDirectory to delegate locking to an impl class. LockFactories are responsible to create a Lock on behalf of a BaseDirectory subclass.
The following changes in existing code need to be done:
The constructor of Tokenizer no longer takes Reader, as this was a leftover from before it was reusable. See the org.apache.lucene.analysis package documentation for more details.
The Collector API has been refactored to use a different Collector instance per segment. It is possible to migrate existing collectors painlessly by extending SimpleCollector instead of Collector: SimpleCollector is a specialization of Collector that returns itself as a per-segment Collector.
Like collectors (see above), field comparators have been refactored to produce a new comparator (called LeafFieldComparator) per segment. It is possible to migrate existing comparators painlessly by extending SimpleFieldComparator, which will implements both FieldComparator and LeafFieldComparator and return itself as a per-segment comparator.
Users are advised to switch to BooleanFilter instead.
OpenBitSet only differs from LongBitSet by its ability to grow automatically. In case growth is required, it would need to be managed externally.
Bugs fixed in several ValueSource functions may result in different behavior in situations where some documents do not have values for fields wrapped in other ValueSources. Users who want to preserve the previous behavior may need to wrap their ValueSources in a "DefFunction" along with a ConstValueSource of "0.0".
PayloadAttributeImpl.clone() did a shallow clone which was incorrect, and was fixed to do a deep clone. If you require shallow cloning of the underlying bytes, you should override PayloadAttributeImpl.clone() to do a shallow clone instead.
Bulk scorers must now always collect documents in order. If you have custom collectors, the acceptsDocsOutOfOrder method has been removed and collectors can safely assume that they will be collected in order.
AtomicReader and AtomicReaderContext are now called LeafReader and LeafReaderContext, respectively.
These methods were removed because they are dangerous since they let you analyze each document arbitrarily differently, making it difficult to properly analyze text at query time and easy to accidentally "lose" search hits. Instead, you should break out text into separate fields and use a different analyzer for each field with PerFieldAnalyzerWrapper.