Lucene contrib change Log
  - Bug Fixes   (6)
    
      - LUCENE-2277: QueryNodeImpl threw ConcurrentModificationException on
add(List<QueryNode>).
(Frank Wesemann via Robert Muir) 
      - LUCENE-2284: MatchAllDocsQueryNode toString() created an invalid XML tag.
(Frank Wesemann via Robert Muir) 
      - LUCENE-2278: FastVectorHighlighter: Highlighted term is out of alignment
in multi-valued NOT_ANALYZED field.
(Koji Sekiguchi) 
      - LUCENE-2524: FastVectorHighlighter: use mod for getting colored tag.
(Koji Sekiguchi) 
      - LUCENE-2616: FastVectorHighlighter: out of alignment when the first value is
empty in multiValued field
(Koji Sekiguchi) 
      - LUCENE-2731, LUCENE-2732: Fix (charset) problems in XML loading in
HyphenationCompoundWordTokenFilter (partial bugfix-only in 2.9 and 3.0,
full fix will be in later 3.1).
(Uwe Schinder) 
    
   
  - Documentation   (2)
    
      - LUCENE-2055: Add documentation noting that the Dutch and French stemmers
in contrib/analyzers do not implement the Snowball algorithm correctly,
and recommend to use the equivalents in contrib/snowball if possible.
(Robert Muir, Uwe Schindler, Simon Willnauer) 
      - LUCENE-2653: Add documentation noting that ThaiWordFilter will not work
as expected on all JRE's. For example, on an IBM JRE, it does nothing.
(Robert Muir) 
    
   
    
  - New features   (1)
    
      - LUCENE-2108: Spellchecker now safely supports concurrent modifications to
the spell-index. Threads can safely obtain term suggestions while the spell-
index is rebuild, cleared or reset. Internal IndexSearcher instances remain
open until the last thread accessing them releases the reference.
(Simon Willnauer) 
    
   
  - Bug Fixes   (6)
    
      - LUCENE-2144: Fix InstantiatedIndex to handle termDocs(null)
correctly (enumerate all non-deleted docs).
(Karl Wettin via Mike
McCandless) 
      - LUCENE-2199: ShingleFilter skipped over tri-gram shingles if outputUnigram
was set to false.
(Simon Willnauer) 
      - LUCENE-1939: IndexOutOfBoundsException at ShingleMatrixFilter's
Iterator#hasNext method on exhausted streams.
(Patrick Jungermann via Karl Wettin) 
      - LUCENE-2211: Fix missing clearAttributes() calls:
ShingleMatrix, PrefixAware, compounds, NGramTokenFilter,
EdgeNGramTokenFilter, Highlighter, and MemoryIndex.
(Uwe Schindler, Robert Muir) 
      - LUCENE-2207, LUCENE-2219: Fix incorrect offset calculations in end() for
CJKTokenizer, ChineseTokenizer, SmartChinese SentenceTokenizer,
and WikipediaTokenizer.
(Koji Sekiguchi, Robert Muir) 
      - LUCENE-2266: Fixed offset calculations in NGramTokenFilter and
EdgeNGramTokenFilter.
(Joe Calderon, Robert Muir via Uwe Schindler) 
    
   
  - API Changes   (2)
    
      - LUCENE-2108: Add SpellChecker.close, to close the underlying
reader.
(Eirik Bjørsnøs via Mike McCandless) 
      - LUCENE-2165: Add a constructor to SnowballAnalyzer that takes a Set of
stopwords, and deprecate the String[] one.
(Nick Burch via Robert Muir) 
    
   
  - Changes in backwards compatibility policy   (1)
    
      - LUCENE-2002: Add required Version matchVersion argument when
constructing ComplexPhraseQueryParser and default (as of 2.9)
enablePositionIncrements to true to match StandardAnalyzer's
default.  Also added required matchVersion to most of the analyzers
(Uwe Schindler, Mike McCandless) 
    
   
  - Changes in runtime behavior   (1)
    
      - LUCENE-1963: ArabicAnalyzer now lowercases before checking the stopword
list. This has no effect on Arabic text, but if you are using a custom
stopword list that contains some non-Arabic words, you'll need to fully
reindex.
(DM Smith via Robert Muir) 
    
   
  - Bug fixes   (7)
    
      - LUCENE-1953: FastVectorHighlighter: small fragCharSize can cause
StringIndexOutOfBoundsException.
(Koji Sekiguchi) 
      - LUCENE-1929: Highlighter throws exception on NumericRangeQuery and does not
support deprecated RangeQuery.
(Mark Miller) 
      - LUCENE-2001: Wordnet Syns2Index incorrectly parses synonyms that
contain a single quote.
(Parag H. Dave via Robert Muir) 
      - LUCENE-2003: Highlighter doesn't respect position increments other than 1 with
PhraseQuerys.
(Uwe Schindler, Mark Miller) 
      - LUCENE-1954: InstantiatedIndexWriter: Fixed ClassCastException with
NumericField because of incorrect unchecked cast: Document.getFields()
returns List<Fieldable>.
(Bernd Fondermann via Uwe Schindler) 
      - LUCENE-2014: SmartChineseAnalyzer did not properly clear attributes
in WordTokenFilter. If enablePositionIncrements is set for StopFilter,
then this could create invalid position increments, causing IndexWriter
to crash.
(Robert Muir, Uwe Schindler) 
      - LUCENE-2013: SpanRegexQuery does not work with QueryScorer.
(Benjamin Keil via Mark Miller) 
    
   
  - Changes in runtime behavior   (2)
    
      - LUCENE-1505: Local lucene now uses org.apache.lucene.util.NumericUtils for all
 number conversion.  You'll need to fully re-index any previously created indexes.
 This isn't a break in back-compatibility because local Lucene has not yet
 been released.
(Mike McCandless) 
      - LUCENE-1758: ArabicAnalyzer now uses the light10 algorithm, has a refined
 default stopword list, and lowercases non-Arabic text.
 You'll need to fully re-index any previously created indexes. This isn't a
 break in back-compatibility because ArabicAnalyzer has not yet been
 released.
(Robert Muir) 
    
   
  - API Changes   (5)
    
      - LUCENE-1695: Update the Highlighter to use the new TokenStream API. This issue breaks backwards
 compatibility with some public classes. If you have implemented custom Fragmenters or Scorers,
 you will need to adjust them to work with the new TokenStream API. Rather than getting passed a
 Token at a time, you will be given a TokenStream to init your impl with - store the Attributes
 you are interested in locally and access them on each call to the method that used to pass a new
 Token. Look at the included updated impls for examples.
(Mark Miller) 
      - LUCENE-1460: Change contrib TokenStreams/Filters to use the new
 TokenStream API.
(Robert Muir, Michael Busch) 
      - LUCENE-1775, LUCENE-1903: Change remaining TokenFilters (shingle, prefix-suffix)
 to use the new TokenStream API. ShingleFilter is much more efficient now,
 it clones much less often and computes the tokens mostly on the fly now.
 Also added more tests.
(Robert Muir, Michael Busch, Uwe Schindler, Chris Harris) 
      - LUCENE-1685: The position aware SpanScorer has become the default scorer
 for Highlighting. The SpanScorer implementation has replaced QueryScorer
 and the old term highlighting QueryScorer has been renamed to
 QueryTermScorer. Multi-term queries are also now expanded by default. If
 you were previously rewriting the query for multi-term query highlighting,
 you should no longer do that (unless you switch to using QueryTermScorer).
 The SpanScorer API (now QueryScorer) has also been improved to more closely
 match the API of the previous QueryScorer implementation.
(Mark Miller) 
      - LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian
 Analyzers. If you need to index text in these encodings, please use Java's
 character set conversion facilities (InputStreamReader, etc) during I/O,
 so that Lucene can analyze this text as Unicode instead.
(Robert Muir) 
    
   
  - Bug fixes   (12)
    
      - LUCENE-1423: InstantiatedTermEnum#skipTo(Term) throws ArrayIndexOutOfBounds on empty index.
(Karl Wettin) 
      - LUCENE-1462: InstantiatedIndexWriter did not reset pre analyzed TokenStreams the
 same way IndexWriter does. Parts of InstantiatedIndex was not Serializable.
(Karl Wettin) 
      - LUCENE-1510: InstantiatedIndexReader#norms methods throws NullPointerException on empty index.
(Karl Wettin, Robert Newson) 
      - LUCENE-1514: ShingleMatrixFilter#next(Token) easily throws a StackOverflowException
 due to recursive invocation.
(Karl Wettin) 
      - LUCENE-1548: Fix distance normalization in LevenshteinDistance to
 not produce negative distances
(Thomas Morton via Mike McCandless) 
      - LUCENE-1490: Fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS
 characters to only apply to the correct subset
(Daniel Cheng via
 Mike McCandless) 
      - LUCENE-1576: Fix BrazilianAnalyzer to downcase tokens after
 StandardTokenizer so that stop words with mixed case are filtered
 out.
(Rafael Cunha de Almeida, Douglas Campos via Mike McCandless) 
      - LUCENE-1491: EdgeNGramTokenFilter no longer stops on tokens shorter than minimum n-gram size.
(Todd Teak via Otis Gospodnetic) 
      - LUCENE-1683: Fixed JavaUtilRegexCapabilities (an impl used by
 RegexQuery) to use Matcher.matches() not Matcher.lookingAt() so
 that the regexp must match the entire string, not just a prefix.
(Trejkaz via Mike McCandless) 
      - LUCENE-1792: Fix new query parser to set rewrite method for
 multi-term queries.
(Luis Alves, Mike McCandless via Michael Busch) 
      - LUCENE-1828: Fix memory index to call TokenStream.reset() and
 TokenStream.end().
(Tim Smith via Michael Busch) 
      - LUCENE-1912: Fix fast-vector-highlighter issue when two or more
terms are concatenated
(Koji Sekiguchi via Mike McCandless) 
    
   
  - New features   (17)
    
      - LUCENE-1531: Added support for BoostingTermQuery to XML query parser.
(Karl Wettin) 
      - LUCENE-1435: Added contrib/collation, a CollationKeyFilter
 allowing you to convert tokens into CollationKeys encoded using
 IndexableBinaryStringTools.  This allows for faster RangeQuery when
 a field needs to use a custom Collator.
(Steven Rowe via Mike
 McCandless) 
      - LUCENE-1591: EnWikiDocMaker, LineDocMaker, WriteLineDoc can now
 read/write bz2 using Apache commons compress library.  This means
 you can download the .bz2 export from http://wikipedia.org and
 immediately index it.
(Shai Erera via Mike McCandless) 
      - LUCENE-1629: Add SmartChineseAnalyzer to contrib/analyzers.  It
 improves on CJKAnalyzer and ChineseAnalyzer by handling Chinese
 sentences properly.  SmartChineseAnalyzer uses a Hidden Markov
 Model to tokenize Chinese words in a more intelligent way.
(Xiaoping Gao via Mike McCandless) 
      - LUCENE-1676: Added DelimitedPayloadTokenFilter class for automatically adding payloads "in-stream"
(Grant Ingersoll) 
      - LUCENE-1578: Support for loading unoptimized readers to the
 constructor of InstantiatedIndex.
(Karl Wettin) 
      - LUCENE-1704: Allow specifying the Tidy configuration file when
 parsing HTML docs with contrib/ant.
(Keith Sprochi via Mike
 McCandless) 
      - LUCENE-1522: Added contrib/fast-vector-highlighter, a new alternative
 highlighter.
(Koji Sekiguchi via Mike McCandless) 
      - LUCENE-1740: Added "analyzer" command to Lucli, enabling changing
 the analyzer from the default StandardAnalyzer.
(Bernd Fondermann
 via Mike McCandless) 
      - LUCENE-1272: Add get/setBoost to MoreLikeThis.
(Jonathan
 Leibiusky via Mike McCandless) 
      - LUCENE-1745: Added constructors to JakartaRegexpCapabilities and
 JavaUtilRegexCapabilities as well as static flags to support
 configuring a RegexCapabilities implementation with the
 implementation-specific modifier flags. Allows for callers to
 customize the RegexQuery using the implementation-specific options
 and fine tune how regular expressions are compiled and
 matched.
(Marc Zampetti zampettim@aim.com via Mike McCandless) 
      - LUCENE-1567: Added a new QueryParser framework, that allows
 implementing a new query syntax in a flexible and efficient way.
 This new QueryParser will be moved to Lucene's core in release
 3.0 and will then replace the current core QueryParser, which
 has been deprecated with this patch.
(Luis Alves and Adriano Campos via Michael Busch) 
      - LUCENE-1486: Added ComplexPhraseQueryParser, an extension of QueryParser
 that allows a subset of the Lucene query language to be embedded in
 PhraseQuerys. Wildcard, Range, and Fuzzy queries, as well as limited
 boolean logic, can be used within quote operators with this parser, ie:
 "(jo* -john) smyth~".
(Mark Harwood via Mark Miller) 
      - Added web-based demo of functionality in contrib's XML Query Parser
 packaged as War file
(Mark Harwood) 
      - LUCENE-1406: Added Arabic analyzer.
(Robert Muir via Grant Ingersoll) 
      - LUCENE-1628: Added Persian analyzer.
(Robert Muir) 
      - LUCENE-1813: Add option to ReverseStringFilter to mark reversed tokens.
(Andrzej Bialecki via Robert Muir) 
    
   
  - Optimizations   (2)
    
      - LUCENE-1643: Re-use the collation key (RawCollationKey) for
  better performance, in ICUCollationKeyFilter.
(Robert Muir via
  Mike McCandless) 
      - LUCENE-1794: Implement TokenStream reuse for contrib Analyzers,
  and implement reset() for TokenStreams to support reuse.
(Robert Muir) 
    
   
  - Documentation   (1)
    
      - LUCENE-1876: added missing package level documentation for numerous
  contrib packages.
(Steven Rowe & Robert Muir) 
    
   
  - Build   (2)
    
      - LUCENE-1728: Split contrib/analyzers into common and smartcn modules.
Contrib/analyzers now builds an additional lucene-smartcn Jar file. All
smartcn classes are not included in the lucene-analyzers JAR file.
(Robert Muir via Simon Willnauer) 
      - LUCENE-1829: Fix contrib query parser to properly create javacc files.
(Jan-Pascal and Luis Alves via Michael Busch) 
    
   
  - Test Cases   (none)
    
    
   
  - Changes in runtime behavior   (1)
    
      
(None) 
    
   
  - API Changes   (2)
    
      - 1.
 
      
(None) 
    
   
  - Bug fixes   (2)
    
      - LUCENE-1312: Added full support for InstantiatedIndexReader#getFieldNames()
and tests that assert that deleted documents behaves as they should (they did).
(Jason Rutherglen, Karl Wettin) 
      - LUCENE-1318: InstantiatedIndexReader.norms(String, b[], int) didn't treat
the array offset right.
(Jason Rutherglen via Karl Wettin) 
    
   
  - New features   (3)
    
      - LUCENE-1320: ShingleMatrixFilter, multidimensional shingle token filter.
(Karl Wettin) 
      - LUCENE-1142: Updated Snowball package, org.tartarus distribution revision 500.
Introducing Hungarian, Turkish and Romanian support, updated older stemmers
and optimized (reflectionless) SnowballFilter.
IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY: an index created using the 2.3.2 (or older)
might not be compatible with these updated classes as some algorithms have changed.
(Karl Wettin) 
      - LUCENE-1016: TermVectorAccessor, transparent vector space access via stored vectors
or by resolving the inverted index.
(Karl Wettin) 
    
   
  - Documentation   (1)
    
      
(None) 
    
   
  - Build   (1)
    
      
(None) 
    
   
  - Test Cases   (1)
    
      
(None)