LuceneTM Core News

24 August 2015, Apache Lucene™ 5.3.0 available

The Lucene PMC is pleased to announce the release of Apache Lucene 5.3.0

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://www.apache.org/dyn/closer.lua/lucene/java/5.3.0

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 5.3.0 Release Highlights:

API Changes

  • PhraseQuery and BooleanQuery are now immutable

New features

  • Added a new org.apache.lucene.search.join.CheckJoinIndex class that can be used to validate that an index has an appropriate structure to run join queries
  • Added a new BlendedTermQuery to blend statistics across several terms
  • New common suggest API that mirrors Lucene's Query/IndexSearcher APIs for Document based suggester.
  • IndexWriter can now be initialized from an already open near-real-time or non-NRT reader
  • Add experimental range tree doc values format and queries, based on a 1D version of the spatial BKD tree, for a faster and smaller alternative to postings-based numeric and binary term filtering. Range trees can also handle values larger than 64 bits.
  • Added GeoPointField, GeoPointInBBoxQuery, GeoPointInPolygonQuery for simple "indexed lat/lon point in bbox/shape" searching
  • Added experimental BKD geospatial tree doc values format and queries, for fast "bbox/polygon contains lat/lon points"
  • Use doc values to post-filter GeoPointField hits that fall in boundary cells, resulting in smaller index, faster searches and less heap used for each query

Optimizations

  • Reduce RAM usage of FieldInfos, and speed up lookup by number, by using an array instead of TreeMap except in very sparse cases
  • Faster intersection of the terms dictionary with very finite automata, which can be generated eg. by simple regexp queries
  • Various bugfixes and optimizations since the 5.2.0 release.

See the CHANGES.txt file included with the release for a full list of changes and further details.

15 June 2015, Apache Lucene™ 5.2.1 available

The Lucene PMC is pleased to announce the release of Apache Lucene 5.2.1

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains various bug fixes and optimizations since the 5.2.0 release.

The release is available for immediate download at: http://www.apache.org/dyn/closer.lua/lucene/java/5.2.1

Lucene 5.2.1 includes 3 bug fixes:

  • Fix class loading deadlock relating to Codec initialization, default codec and SPI discovery.
  • NRT readers now reflect a new commit even if there is no change to the commit user data
  • Queries now get a dummy Similarity when scores are not needed in order to not load unnecessary information like norms

See the CHANGES.txt file included with the release for a full list of changes and further details.

7 June 2015 - Lucene Core 5.2.0 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 5.2.0

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://www.apache.org/dyn/closer.lua/lucene/java/5.2.0

Lucene 5.2.0 release highlights:

  • Span queries now share document conjunction/intersection code with boolean queries, and use two-phased iterators for faster intersection by avoiding loading positions in certain cases.

  • Added two-phase support to SpanNotQuery, and SpanPositionCheckQuery and its subclasses: SpanPositionRangeQuery, SpanPayloadCheckQuery, SpanNearPayloadCheckQuery, SpanFirstQuery.

  • Added a new query time join to the join module that uses global ordinals, which is faster for subsequent joins between reopens.

  • New CompositeSpatialStrategy combines speed of RPT with accuracy of SDV. Includes optimized Intersect predicate to avoid many geometry checks. Uses TwoPhaseIterator.

  • New LimitTokenOffsetFilter that limits tokens to those before a configured maximum start offset.

  • New spatial PackedQuadPrefixTree, a generally more efficient choice than QuadPrefixTree, especially for high precision shapes. When used, you should typically disable RPT's pruneLeafyBranches option.

  • Expressions now support bindings keys that look like zero arg functions

  • Add SpanWithinQuery and SpanContainingQuery that return spans inside of / containing another spans.

  • New Spatial "Geo3d" API with partial Spatial4j integration. It is a set of shapes implemented using 3D planar geometry for calculating spatial relations on the surface of a sphere. Shapes include Point, BBox, Circle, Path (buffered line string), and Polygon.

  • Various bugfixes and optimizations since the 5.1.0 release.

See the CHANGES.txt file included with the release for a full list of changes and further details.

14 April 2015 - Lucene Core 5.1.0 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 5.1.0

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://www.apache.org/dyn/closer.lua/lucene/java/5.1.0

Lucene 5.1.0 includes 9 new features, 10 bug fixes, and 24 optimizations / other changes from 18 unique contributors.

See the CHANGES.txt file included with the release for a full list of changes and further details.

5 March 2015 - Lucene Core 4.10.4 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.10.4

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://www.apache.org/dyn/closer.lua/lucene/java/4.10.4

Lucene 4.10.4 includes 13 bug fixes.

See the CHANGES.txt file included with the release for a full list of changes and further details.

20 February 2015 - Lucene™ 5.0.0 core available

The Lucene PMC is pleased to announce the release of Apache Lucene 5.0.

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 5.0 Release Highlights:

Stronger index safety

  • All file access now uses Java’s NIO.2 APIs which give Lucene stronger index safety in terms of better error handling and safer commits.

  • Every Lucene segment now stores a unique id per-segment and per-commit to aid in accurate replication of index files.

  • During merging, IndexWriter now always checks the incoming segments for corruption before merging. This can mean, on upgrading to 5.0.0, that merging may uncover long-standing latent corruption in an older 4.x index.

Reduced heap usage

  • Lucene now supports random-writable and advance-able sparse bitsets (RoaringDocIdSet and SparseFixedBitSet), so the heap required is in proportion to how many bits are set, not how many total documents exist in the index.

  • Heap usage during IndexWriter merging is also much lower with the new Lucene50Codec, since doc values and norms for the segments being merged are no longer fully loaded into heap for all fields; now they are loaded for the one field currently being merged, and then dropped.

  • The default norms format now uses sparse encoding when appropriate, so indices that enable norms for many sparse fields will see a large reduction in required heap at search time.

  • 5.0 has a new API to print a tree structure showing a recursive breakdown of which parts are using how much heap.

Other features

  • FieldCache is gone (moved to a dedicated UninvertingReader in the misc module). This means when you intend to sort on a field, you should index that field using doc values, which is much faster and less heap consuming than FieldCache.

  • Tokenizers and Analyzers no longer require Reader on init.

  • NormsFormat now gets its own dedicated NormsConsumer/Producer

  • SortedSetSortField, used to sort on a multi-valued field, is promoted from sandbox to Lucene's core.

  • PostingsFormat now uses a "pull" API when writing postings, just like doc values. This is powerful because you can do things in your postings format that require making more than one pass through the postings such as iterating over all postings for each term to decide which compression format it should use.

  • New DateRangeField type enables Indexing and searching of date ranges, particularly multi-valued ones.

  • A new ExitableDirectoryReader extends FilterDirectoryReader and enables exiting requests that take too long to enumerate over terms.

  • Suggesters from multi-valued field can now be built as DocumentDictionary now enumerates each value separately in a multi-valued field.

  • ConcurrentMergeScheduler detects whether the index is on SSD or not and does a better job defaulting its settings. This only works on Linux for now; other OS's will continue to use the previous defaults (tuned for spinning disks).

  • Auto-IO-throttling has been added to ConcurrentMergeScheduler, to rate limit IO writes for each merge depending on incoming merge rate.

  • CustomAnalyzer has been added that allows to configure analyzers like you do in Solr's index schema. This class has a builder API to configure Tokenizers, TokenFilters, and CharFilters based on their SPI names and parameters as documented by the corresponding factories.

  • Memory index now supports payloads.

  • Added a filter cache with a usage tracking policy that caches filters based on frequency of use.

  • The default codec has an option to control BEST_SPEED or BEST_COMPRESSION for stored fields.

  • Stored fields are merged more efficiently, especially when upgrading from previous versions or using SortingMergePolicy

NOTE: Lucene 5 no longer supports the Lucene 3.x index format. Opening indexes will result in IndexFormatTooOldException. It is recommended to either reindex all your data, or upgrade the old indexes with the IndexUpgrader tool of latest Lucene 4 version (4.10.x). Those indexes can then be read (see next section) with Lucene 5.

To read more about the changes, also see: http://blog.mikemccandless.com/2014/11/apache-lucene-500-is-coming.html

Please read CHANGES.txt and MIGRATE.txt for a full list of new features and notes on upgrading.

29 December 2014 - Lucene Core 4.10.3 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.10.3

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 4.10.3 includes 12 bug fixes.

See the CHANGES.txt file included with the release for a full list of changes and further details, and Happy Holidays!

31 October 2014 - Lucene Core 4.10.2 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.10.2

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 4.10.2 includes 2 bug fixes.

See the CHANGES.txt file included with the release for a full list of changes and further details, and Happy Halloween!

29 September 2014 - Lucene Core 4.10.1 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.10.1

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 4.10.1 includes 7 bug fixes.

See the CHANGES.txt file included with the release for a full list of changes and further details.

22 September 2014 - Lucene Core 4.9.1 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.9.1

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 4.9.1 includes 7 bug fixes.

See the CHANGES.txt file included with the release for a full list of changes and further details.

03 September 2014 - Lucene Core 4.10.0 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.10.0

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.10.0 Release Highlights:

  • New TermAutomatonQuery using an automaton for proximity queries. http://blog.mikemccandless.com/2014/08/a-new-proximity-query-for-lucene-using.html

  • New OrdsBlockTree terms dictionary supporting ord lookup.

  • Simplified matchVersion handling for Analyzers with new setVersion method, as well as Analyzer constructors not requiring Version.

  • Fixed possible corruption when opening a 3.x index with NRT reader.

  • Fixed edge case in StandardTokenizer that caused extremely slow parsing times with long text which partially matched grammar rules.

25 June 2014 - Lucene Core 4.9.0 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.9.0

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.9.0 Release Highlights:

  • New Terms.getMin/Max methods to retrieve the lowest and highest terms per field.

  • New IDVersionPostingsFormat, optimized for ID lookups that associate a monotonically increasing version per ID.

  • Atomic update of a set of doc values fields.

  • Numerous optimizations for doc values search-time performance.

  • New (default) Lucene49NormsFormat to better compress certain cases such as very short fields.

  • New SORTED_NUMERIC docvalues type for efficient processing of multi-valued numeric fields.

  • Indexer passes previous token stream for easier reuse.

  • MoreLikeThis accepts multiple values per field.

  • All classes that estimate their RAM usage now implement a new Accountable interface.

  • Lucene files are now written by (File)OutputStream on all platforms, completely disallowing seeking with simplified IO APIs.

  • Improve the confusing error message when MMapDirectory cannot create a new map.

20 May 2014 - Lucene Core 4.8.1 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.8.1

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 4.8.1 includes 15 bug fixes.

See the CHANGES.txt file included with the release for a full list of changes and further details.

28 April 2014 - Apache Lucene 4.8.0 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.8.0

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.8.0 Release Highlights:

  • Apache Lucene now requires Java 7 or greater (recommended is Oracle Java 7 or OpenJDK 7, minimum update 55; earlier versions have known JVM bugs affecting Lucene).

  • Apache Lucene is fully compatible with Java 8.

  • All index files now store end-to-end checksums, which are now validated during merging and reading. This ensures that corruptions caused by any bit-flipping hardware problems or bugs in the JVM can be detected earlier. For full detection be sure to enable all checksums during merging (it's disabled by default).

  • Lucene has a new Rescorer/QueryRescorer API to perform second-pass rescoring or reranking of search results using more expensive scoring functions after first-pass hit collection.

  • AnalyzingInfixSuggester now supports near-real-time autosuggest.

  • Simplified impact-sorted postings (using SortingMergePolicy and EarlyTerminatingCollector) to use Lucene's Sort class to express the sort order.

  • Bulk scoring and normal iterator-based scoring were separated, so some queries can do bulk scoring more effectively.

  • Switched to MurmurHash3 to hash terms during indexing.

  • IndexWriter now supports updating of binary doc value fields.

  • HunspellStemFilter now uses 10 to 100x less RAM. It also loads all known OpenOffice dictionaries without error.

  • Lucene now also fsyncs the directory metadata on commits, if the operating system and file system allow it (Linux, MacOSX are known to work).

  • Lucene now uses Java 7 file system functions under the hood, so index files can be deleted on Windows, even when readers are still open.

  • A serious bug in NativeFSLockFactory was fixed, which could allow multiple IndexWriters to acquire the same lock. The lock file is no longer deleted from the index directory even when the lock is not held.

  • Various bugfixes and optimizations since the 4.7.2 release.

15 April 2014 - Lucene Core 4.7.2 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.7.2

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 4.7.2 includes 2 bug fixes, including a possible index corruption with near-realtime search.

See the CHANGES.txt file included with the release for a full list of changes and further details.

02 April 2014 - Lucene Core 4.7.1 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.7.1

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 4.7.1 includes 14 bug fixes; one build improvement; and one change in runtime behavior: AutomatonQuery.equals is no longer implemented as "accepts same language".

See the CHANGES.txt file included with the release for a full list of changes and further details.

12 March 2014 - Apache Lucene 4.8 will require Java 7

The Apache Lucene committers decided with a large majority on the vote to require Java 7 for the next minor release of Apache Lucene (version 4.8)!

The next release will also contain some improvements for Java 7:

  • Better file handling (especially on Windows) in the directory implementations. Files can now be deleted on windows, although the index is still open - like it was always possible on Unix environments (delete on last close semantics).

  • Speed improvements in sorting comparators: Sorting now uses Java 7's own comparators for integer and long sorts, which are highly optimized by the Hotspot VM.

If you want to stay up-to-date with Lucene and Solr, you should upgrade your infrastructure to Java 7. Please be aware that you must use at least use Java 7u1. The recommended version at the moment is Java 7u25. Later versions like 7u40, 7u45,... have a bug causing index corrumption. Ideally use the Java 7u60 prerelease, which has fixed this bug. Once 7u60 is out, this will be the recommended version. In addition, there is no more Oracle/BEA JRockit available for Java 7, use the official Oracle Java 7. JRockit was never working correctly with Lucene/Solr (causing index corrumption), so this should not be an issue. Please also review our list of JVM bugs: http://wiki.apache.org/lucene-java/JavaBugs

EDIT (as of 15 April 2014): The recently released Java 7u55 fixes the above bug causing index corrumption. This version is now the recommended version for running Apache Lucene.

26 February 2014 - Lucene Core 4.7 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.7

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.7 Release Highlights:

  • When sorting by String (SortField.STRING), you can now specify whether missing values should be sorted first (the default), or last.

  • Add two memory resident dictionaries (FST terms dictionary and FSTOrd terms dictionary) to improve primary key lookups. The PostingsBaseFormat API is also changed so that term dictionaries get the ability to block encode term metadata, and all dictionary implementations can now plug in any PostingsBaseFormat.

  • NRT support for file systems that do not have delete on last close or cannot delete while referenced semantics.

  • Add LongBitSet for managing more than 2.1B bits (otherwise use FixedBitSet).

  • Speed up Lucene range faceting from O(N) per hit to O(log(N)) per hit using segment trees.

  • Add SearcherTaxonomyManager over search and taxonomy index directories (i.e. not only NRT).

  • Drilling down or sideways on a Lucene facet range (using Range.getFilter()) is now faster for costly filters (uses random access, not iteration); range facet counts now accept a fast-match filter to avoid computing the value for documents that are out of bounds, e.g. using a bounding box filter with distance range faceting.

  • Add Analyzer for Kurdish.

  • Add Payload support to FileDictionary (Suggest) and make it more configurable.

  • Add a new BlendedInfixSuggester, which is like AnalyzingInfixSuggester but boosts suggestions that matched tokens with lower positions.

  • Add SimpleQueryParser: parser for human-entered queries.

  • Add multitermquery (wildcards,prefix,etc) to PostingsHighlighter.

  • Upgrade to Spatial4j 0.4.1: Parses WKT (including ENVELOPE) with extension BUFFER; buffering a point results in a Circle. JTS isn't needed for WKT any more but remains required for Polygons. New Shapes: ShapeCollection and BufferedLineString.

  • Add spatial SerializedDVStrategy that serializes a binary representation of a shape into BinaryDocValues. It supports exact geometry relationship calculations.

  • Various bugfixes and optimizations since the 4.6.1 release.