@Deprecated public class Lucene43DictionaryCompoundWordTokenFilter extends Lucene43CompoundWordTokenFilterBase
TokenFilter
that decomposes compound words found in many Germanic languages, using
pre-4.4 behavior.Lucene43CompoundWordTokenFilterBase.CompoundToken
AttributeSource.State
DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, offsetAtt, onlyLongestMatch, termAtt, tokens
input
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
Lucene43DictionaryCompoundWordTokenFilter(TokenStream input,
CharArraySet dictionary)
Deprecated.
Creates a new
Lucene43DictionaryCompoundWordTokenFilter |
Lucene43DictionaryCompoundWordTokenFilter(TokenStream input,
CharArraySet dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
Deprecated.
Creates a new
Lucene43DictionaryCompoundWordTokenFilter |
Modifier and Type | Method and Description |
---|---|
protected void |
decompose()
Deprecated.
Decomposes the current
Lucene43CompoundWordTokenFilterBase.termAtt and places Lucene43CompoundWordTokenFilterBase.CompoundToken instances in the Lucene43CompoundWordTokenFilterBase.tokens list. |
incrementToken, reset
close, end
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public Lucene43DictionaryCompoundWordTokenFilter(TokenStream input, CharArraySet dictionary)
Lucene43DictionaryCompoundWordTokenFilter
input
- the TokenStream
to processdictionary
- the word dictionary to match against.public Lucene43DictionaryCompoundWordTokenFilter(TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Lucene43DictionaryCompoundWordTokenFilter
input
- the TokenStream
to processdictionary
- the word dictionary to match against.minWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output streammaxSubwordSize
- only subwords shorter than this get to the output streamonlyLongestMatch
- Add only the longest matching subword to the streamprotected void decompose()
Lucene43CompoundWordTokenFilterBase
Lucene43CompoundWordTokenFilterBase.termAtt
and places Lucene43CompoundWordTokenFilterBase.CompoundToken
instances in the Lucene43CompoundWordTokenFilterBase.tokens
list.
The original token may not be placed in the list, as it is automatically passed through this filter.decompose
in class Lucene43CompoundWordTokenFilterBase
Copyright © 2000-2015 Apache Software Foundation. All Rights Reserved.