LUCENE-2413: Consolidated Lucene/Solr analysis components into analysis/common.
New features from Solr now available to Lucene users include:
-
o.a.l.analysis.commongrams: Constructs n-grams for frequently occurring terms
and phrases.
-
o.a.l.analysis.charfilter.HTMLStripCharFilter: CharFilter that strips HTML
constructs.
-
o.a.l.analysis.miscellaneous.WordDelimiterFilter: TokenFilter that splits words
into subwords and performs optional transformations on subword groups.
-
o.a.l.analysis.miscellaneous.RemoveDuplicatesTokenFilter: TokenFilter which
filters out Tokens at the same position and Term text as the previous token.
-
o.a.l.analysis.miscellaneous.TrimFilter: Trims leading and trailing whitespace
from Tokens in the stream.
-
o.a.l.analysis.miscellaneous.KeepWordFilter: A TokenFilter that only keeps tokens
with text contained in the required words (inverse of StopFilter).
-
o.a.l.analysis.miscellaneous.HyphenatedWordsFilter: A TokenFilter that puts
hyphenated words broken into two lines back together.
-
o.a.l.analysis.miscellaneous.CapitalizationFilter: A TokenFilter that applies
capitalization rules to tokens.
-
o.a.l.analysis.pattern: Package for pattern-based analysis, containing a
CharFilter, Tokenizer, and TokenFilter for transforming text with regexes.
-
o.a.l.analysis.synonym.SynonymFilter: A synonym filter that supports multi-word
synonyms.
-
o.a.l.analysis.phonetic: Package for phonetic search, containing various
phonetic encoders such as Double Metaphone.
Some existing analysis components changed packages:
-
o.a.l.analysis.KeywordAnalyzer -> o.a.l.analysis.core.KeywordAnalyzer
-
o.a.l.analysis.KeywordTokenizer -> o.a.l.analysis.core.KeywordTokenizer
-
o.a.l.analysis.LetterTokenizer -> o.a.l.analysis.core.LetterTokenizer
-
o.a.l.analysis.LowerCaseFilter -> o.a.l.analysis.core.LowerCaseFilter
-
o.a.l.analysis.LowerCaseTokenizer -> o.a.l.analysis.core.LowerCaseTokenizer
-
o.a.l.analysis.SimpleAnalyzer -> o.a.l.analysis.core.SimpleAnalyzer
-
o.a.l.analysis.StopAnalyzer -> o.a.l.analysis.core.StopAnalyzer
-
o.a.l.analysis.StopFilter -> o.a.l.analysis.core.StopFilter
-
o.a.l.analysis.WhitespaceAnalyzer -> o.a.l.analysis.core.WhitespaceAnalyzer
-
o.a.l.analysis.WhitespaceTokenizer -> o.a.l.analysis.core.WhitespaceTokenizer
-
o.a.l.analysis.PorterStemFilter -> o.a.l.analysis.en.PorterStemFilter
-
o.a.l.analysis.ASCIIFoldingFilter -> o.a.l.analysis.miscellaneous.ASCIIFoldingFilter
-
o.a.l.analysis.ISOLatin1AccentFilter -> o.a.l.analysis.miscellaneous.ISOLatin1AccentFilter
-
o.a.l.analysis.KeywordMarkerFilter -> o.a.l.analysis.miscellaneous.KeywordMarkerFilter
-
o.a.l.analysis.LengthFilter -> o.a.l.analysis.miscellaneous.LengthFilter
-
o.a.l.analysis.PerFieldAnalyzerWrapper -> o.a.l.analysis.miscellaneous.PerFieldAnalyzerWrapper
-
o.a.l.analysis.TeeSinkTokenFilter -> o.a.l.analysis.sinks.TeeSinkTokenFilter
-
o.a.l.analysis.CharFilter -> o.a.l.analysis.charfilter.CharFilter
-
o.a.l.analysis.BaseCharFilter -> o.a.l.analysis.charfilter.BaseCharFilter
-
o.a.l.analysis.MappingCharFilter -> o.a.l.analysis.charfilter.MappingCharFilter
-
o.a.l.analysis.NormalizeCharMap -> o.a.l.analysis.charfilter.NormalizeCharMap
-
o.a.l.analysis.CharArraySet -> o.a.l.analysis.util.CharArraySet
-
o.a.l.analysis.CharArrayMap -> o.a.l.analysis.util.CharArrayMap
-
o.a.l.analysis.ReusableAnalyzerBase -> o.a.l.analysis.util.ReusableAnalyzerBase
-
o.a.l.analysis.StopwordAnalyzerBase -> o.a.l.analysis.util.StopwordAnalyzerBase
-
o.a.l.analysis.WordListLoader -> o.a.l.analysis.util.WordListLoader
-
o.a.l.analysis.CharTokenizer -> o.a.l.analysis.util.CharTokenizer
-
o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils
All analyzers in contrib/analyzers and contrib/icu were moved to the
analysis/ module. The 'smartcn' and 'stempel' components now depend on 'common'.
(Chris Male, Robert Muir)