LuceneTM Core News

15 April 2014 - Lucene Core 4.7.2 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.7.2

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 4.7.2 includes 2 bug fixes, including a possible index corruption with near-realtime search.

See the CHANGES.txt file included with the release for a full list of changes and further details.

02 April 2014 - Lucene Core 4.7.1 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.7.1

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 4.7.1 includes 14 bug fixes; one build improvement; and one change in runtime behavior: AutomatonQuery.equals is no longer implemented as "accepts same language".

See the CHANGES.txt file included with the release for a full list of changes and further details.

12 March 2014 - Apache Lucene 4.8 will require Java 7

The Apache Lucene committers decided with a large majority on the vote to require Java 7 for the next minor release of Apache Lucene (version 4.8)!

The next release will also contain some improvements for Java 7:

  • Better file handling (especially on Windows) in the directory implementations. Files can now be deleted on windows, although the index is still open - like it was always possible on Unix environments (delete on last close semantics).

  • Speed improvements in sorting comparators: Sorting now uses Java 7's own comparators for integer and long sorts, which are highly optimized by the Hotspot VM.

If you want to stay up-to-date with Lucene and Solr, you should upgrade your infrastructure to Java 7. Please be aware that you must use at least use Java 7u1. The recommended version at the moment is Java 7u25. Later versions like 7u40, 7u45,... have a bug causing index corrumption. Ideally use the Java 7u60 prerelease, which has fixed this bug. Once 7u60 is out, this will be the recommended version. In addition, there is no more Oracle/BEA JRockit available for Java 7, use the official Oracle Java 7. JRockit was never working correctly with Lucene/Solr (causing index corrumption), so this should not be an issue. Please also review our list of JVM bugs: http://wiki.apache.org/lucene-java/JavaBugs

26 February 2014 - Lucene Core 4.7 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.7

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.7 Release Highlights:

  • When sorting by String (SortField.STRING), you can now specify whether missing values should be sorted first (the default), or last.

  • Add two memory resident dictionaries (FST terms dictionary and FSTOrd terms dictionary) to improve primary key lookups. The PostingsBaseFormat API is also changed so that term dictionaries get the ability to block encode term metadata, and all dictionary implementations can now plug in any PostingsBaseFormat.

  • NRT support for file systems that do not have delete on last close or cannot delete while referenced semantics.

  • Add LongBitSet for managing more than 2.1B bits (otherwise use FixedBitSet).

  • Speed up Lucene range faceting from O(N) per hit to O(log(N)) per hit using segment trees.

  • Add SearcherTaxonomyManager over search and taxonomy index directories (i.e. not only NRT).

  • Drilling down or sideways on a Lucene facet range (using Range.getFilter()) is now faster for costly filters (uses random access, not iteration); range facet counts now accept a fast-match filter to avoid computing the value for documents that are out of bounds, e.g. using a bounding box filter with distance range faceting.

  • Add Analyzer for Kurdish.

  • Add Payload support to FileDictionary (Suggest) and make it more configurable.

  • Add a new BlendedInfixSuggester, which is like AnalyzingInfixSuggester but boosts suggestions that matched tokens with lower positions.

  • Add SimpleQueryParser: parser for human-entered queries.

  • Add multitermquery (wildcards,prefix,etc) to PostingsHighlighter.

  • Upgrade to Spatial4j 0.4.1: Parses WKT (including ENVELOPE) with extension BUFFER; buffering a point results in a Circle. JTS isn't needed for WKT any more but remains required for Polygons. New Shapes: ShapeCollection and BufferedLineString.

  • Add spatial SerializedDVStrategy that serializes a binary representation of a shape into BinaryDocValues. It supports exact geometry relationship calculations.

  • Various bugfixes and optimizations since the 4.6.1 release.

28 January 2014 - Lucene Core 4.6.1 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.6.1

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains a handful of bug fixes. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

24 November 2013 - Lucene Core 4.6 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.6

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.6 Release Highlights:

  • Added support for NumericDocValues field updates (without re-indexing the document) through IndexWriter.updateNumericDocValue(Term, String, Long).

  • New FreeTextSuggester can predict the next word using a simple ngram language model useful for "long tail" suggestions.

  • A new expression module allows for customized ranking with script-like syntax.

  • A new DirectDocValuesFormat can hold all doc values in heap as uncompressed java native arrays.

  • Term.hasFreqs can now determine if a given field indexed per-doc term frequencies.

  • Various bugfixes and optimizations since the 4.5.1 release.

24 October 2013 - Lucene Core 4.5.1 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.5.1

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains a handful of bug fixes. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.5.1 Release Highlights:

  • Lucene 4.5.1 includes 8 bug fixes.

5 October 2013 - Lucene Core 4.5 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.5

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.5 Release Highlights:

  • Added support for missing values to DocValues fields through AtomicReader.getDocsWithField.

  • Lucene 4.5 has a new Lucene45Codec with Lucene45DocValues, supporting missing values and with most datastructures residing off-heap.

  • New in-memory DocIdSet implementations which are especially better than FixedBitSet on small sets: WAH8DocIdSet, PFORDeltaDocIdSet and EliasFanoDocIdSet.

  • CachingWrapperFilter now caches filters with WAH8DocIdSet by default, which has the same memory usage as FixedBitSet in the worst case but is smaller and faster on small sets.

  • TokenStreams now set the position increment in end(), so we can handle trailing holes.

  • IndexWriter no longer clones the given IndexWriterConfig.

  • Various bugfixes and optimizations since the 4.4 release.

23 July 2013 - Lucene Core 4.4 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.4

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.4 Release Highlights:

  • New Replicator module: replicate index revisions between server and client. See http://shaierera.blogspot.com/2013/05/the-replicator.html

  • New AnalyzingInfixSuggester: finds suggestions based on matches to any tokens in the suggestion, not just based on pure prefix matching. See http://blog.mikemccandless.com/2013/06/a-new-lucene-suggester-based-on-infix.html

  • New PatternCaptureGroupTokenFilter: emit multiple tokens, one for each capture group in one or more Java regexes.

  • New Lucene Facet module features:

    • Added dynamic (no taxonomy index used) numeric range faceting (see http://blog.mikemccandless.com/2013/05/dynamic-faceting-with-lucene.html )
    • Arbitrary Querys are now allowed for per-dimension drill-down on DrillDownQuery and DrillSideways, to support future dynamic faceting.
    • New FacetResult.mergeHierarchies: merge multiple FacetResult of the same dimension into a single one with the reconstructed hierarchy.
  • FST's Builder can now handle more than 2.1 billion "tail nodes" while building a minimal FST.

  • FieldCache Ints and Longs now use bit-packing to save memory. String fields have more efficient compression if there are many unique terms.

  • Improved compression for NumericDocValues for dates and fields with very small numbers of unique values.

  • New IndexWriter.hasUncommittedChanges(): returns true if there are changes that have not been committed.

  • multiValuedSeparator in PostingsHighlighter is now configurable, for cases where you want a different logical separator between field values.

  • NorwegianLightStemFilter and NorwegianMinimalStemFilter have been extended to handle "nynorsk".

  • New ScandinavianFoldingFilter and ScandinavianNormalizationFilter.

  • Easier compressed norms: Lucene42NormsFormat now takes an overhead parameter, allowing for values other than PackedInts.FASTEST.

  • Analyzer now has an additional tokenStream(String fieldName, String text) method, so wrapping by StringReader for common use is no longer needed.

  • New SimpleMergedSegmentWarmer: just ensures that data structures (terms, norms, docvalues, etc.) are initialized.

  • IndexWriter flushes segments to the compound file format by default.

  • Various bugfixes and optimizations since the 4.3.1 release.

18 June 2013 - Lucene Core 4.3.1 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.3.1

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains a handful of bug fixes and optimizations, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.3.1 Release Highlights:

  • Lucene 4.3.1 includes 12 bug fixes and 1 optimization, including fixes for a serious bug that can cause deadlock.

6 May 2013 - Lucene Core 4.3.0 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.3

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains a handful of bug fixes and optimizations, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.3.0 Release Highlights:

  • Significant performance improvements for minShouldMatch BooleanQuery due to skipping resulting in up to 4000% faster queries.

  • A new SortingAtomicReader which allows sorting an index based on a sort criteria (e.g. a numeric DocValues field), as well as SortingMergePolicy which sorts documents before segments are merged.

  • DocIdSetIterator and Scorer now has a cost API that provides an upper bound of the number of documents the iterator might match. This API allows optimisation during query execution or how filters are applied.

  • Analyzing/FuzzySuggester now allow to record arbitrary byte[] as a payload. The suggesters also use an ending offset to determine whether the last token was finished or not, so that a query "i " will no longer suggest "Isla de Muerta" for example.

  • Lucene Spatial Module can now search for indexed shapes by Within, Contains, and Disjoint relationships, in addition to typical Intersects.

  • PostingsHighlighter now allows custom passage scores, per-field BreakIterators and has been detached from TopDocs. Additionally, subclasses can override where string values for highlighting are pulled from alternatively to stored fields.

  • New SearcherTaxonomyManager manages near-real-time reopens of both IndexSearcher and TaxonomyReader (for faceting).

  • Added new facet method to the facet module to compute facet counts using SortedSetDocValuesField, without a separate taxonomy index.

  • DrillSideways class, for computing sideways facet counts, is now more flexible: it allows more than one FacetRequest per dimension and now allows drilling down on dimensions that do not have a facet request.

  • Various bugfixes and optimizations since the 4.2.1 release.

3 April 2013 - Lucene Core 4.2.1 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.2.1

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains a handful of bug fixes and optimizations, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.2.1 Release Highlights:

  • Lucene 4.2.1 includes 9 bug fixes and 3 optimizations, including a fix for a serious bug that could result in the loss of an index.

11 March 2013 - Lucene Core 4.2.0 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.2

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.2 Release Highlights:

  • Lucene 4.2 has a new default codec (Lucene42Codec) with a more efficient docvalues format (sorted bytes in FST, less addressing overhead, improved numeric compression) and smaller term vectors (LZ4-compressed terms dictionaries and payloads, delta-encoded positions and offsets using blocks of packed integers).

  • Doc values external and codec API and implementations have been simplified: the codec is no longer responsible for buffering doc values; the numerous types have been consolidated down to only three (NUMERIC, BINARY, SORTED); PerFieldDocValuesFormat lets you set a different format for each field, and the doc values and FieldCache APIs were unified.

  • Significant refactoring and performance enhancements to the facet module, resulting in overall ~3.8X speedup in one case (single Date field faceting).

  • DrillDownQuery in the facet module now supports multi-select.

  • A new DrillSideways class enables counting facet labels and counts for both hits and near-misses in a single query. See http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html

  • An additional docvalues type (SORTED_SET) was added that supports multiple values.

  • FSTs are a bit smaller, and the FST package supports FSTs over 2GB in size.

  • A new LiveFieldValues class lets you get live or real-time values for any indexed doc / field. See http://blog.mikemccandless.com/2013/01/getting-real-time-field-values-in-lucene.html

  • Added a new classification module.

  • Various bugfixes and optimizations since the 4.1 release.

22 January 2013 - Lucene Core 4.1.0 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.1

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Lucene 4.1 Release Highlights:

  • Lucene 4.1 has a new default codec (Lucene41Codec) based on the previously-experimental "Block" indexing format for improved performance, but also incorporating the functionality of "Appending" and "Pulsing".

  • The default codec incorporates the optimization of Pulsing: terms that appear in only one document (such as primary key/id fields) just store the document id in the term dictionary instead of a pointer to this document id in a separate file.

  • The default codec incorporates an efficient compressed stored fields implementation that compresses chunks of documents together with LZ4. (see http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)

  • Lucene no longer seeks when writing files (all fields are written in an append-only way). This means it works by default with append-only streams, hdfs, etc.

  • New suggest implementations: AnalyzingSuggester, where the underlying form (computed from a lucene Analyzer) used for suggestions is separate from the returned text (see http://blog.mikemccandless.com/2012/09/lucenes-new-analyzing-suggester.html), and FuzzySuggester, which additionally allows for inexact matching on the input.

  • Near-realtime support was added to the facet module. (see http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.html)

  • New Highlighter (postingshighlighter) added to the highlighter module. (see http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html)

  • Added FilterStrategy to FilteredQuery for more flexibility in filtered query execution.

  • Added CommonTermsQuery to speed up queries with very highly frequent terms. Term frequencies are efficiently detected at query time - no index time preparation required.

  • Several bugfixes and optimizations since the 4.0 release.

25 December 2012 - Lucene Core 3.6.2 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 3.6.2.

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release is a bug fix release for version 3.6.1. It contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-3x-redir.html.

See the CHANGES.txt file included with the release for a full list of details.

Lucene 3.6.2 Release Highlights:

  • Fixed ArrayIndexOutOfBoundsException when the in-memory terms index requires more than 2.1GB of RAM (billions of terms).

  • Fixed a bug in contrib/queryparser's parsing of boolean queries.

  • Fixed BooleanScorer2 to return the correct freq() when using the scorer visitor API.

  • Fixed IndexWriter RAM accounting bug that would cause it to flush too early when using many different field names.

  • Several other minor bugfixes: scoring bugs when using a custom coord(), a rare IndexWriter thread-safety issue, and fixes to the faceting and highlighting modules.

12 October 2012 - Lucene Core 4.0 Available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.0

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Noteworthy changes since 4.0-BETA:

  • A new "Block" PostingsFormat offering improved search performance and index compression. This will likely become the default format in a future release.
  • All non-default codec implementations were moved to a separated codecs module. Just add lucene-codecs-4.0.0.jar to your classpath to test these out.
  • Payloads can be optionally stored on the term vectors.
  • Many bugfixes and optimizations.