@Deprecated public class Lucene43HyphenationCompoundWordTokenFilter extends Lucene43CompoundWordTokenFilterBase
TokenFilter
that decomposes compound words found in many Germanic languages,
using pre-4.4 behavior.Lucene43CompoundWordTokenFilterBase.CompoundToken
AttributeSource.State
DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, offsetAtt, onlyLongestMatch, termAtt, tokens
input
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
Lucene43HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator)
Deprecated.
Create a HyphenationCompoundWordTokenFilter with no dictionary.
|
Lucene43HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
CharArraySet dictionary)
Deprecated.
Creates a new
Lucene43HyphenationCompoundWordTokenFilter instance. |
Lucene43HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
CharArraySet dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
Deprecated.
Creates a new
Lucene43HyphenationCompoundWordTokenFilter instance. |
Lucene43HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
int minWordSize,
int minSubwordSize,
int maxSubwordSize)
Deprecated.
Create a HyphenationCompoundWordTokenFilter with no dictionary.
|
Modifier and Type | Method and Description |
---|---|
protected void |
decompose()
Deprecated.
Decomposes the current
Lucene43CompoundWordTokenFilterBase.termAtt and places Lucene43CompoundWordTokenFilterBase.CompoundToken instances in the Lucene43CompoundWordTokenFilterBase.tokens list. |
static HyphenationTree |
getHyphenationTree(InputSource hyphenationSource)
Deprecated.
Create a hyphenator tree
|
static HyphenationTree |
getHyphenationTree(String hyphenationFilename)
Deprecated.
Create a hyphenator tree
|
incrementToken, reset
close, end
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public Lucene43HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary)
Lucene43HyphenationCompoundWordTokenFilter
instance.input
- the TokenStream
to processhyphenator
- the hyphenation pattern tree to use for hyphenationdictionary
- the word dictionary to match against.public Lucene43HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Lucene43HyphenationCompoundWordTokenFilter
instance.input
- the TokenStream
to processhyphenator
- the hyphenation pattern tree to use for hyphenationdictionary
- the word dictionary to match against.minWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output streammaxSubwordSize
- only subwords shorter than this get to the output streamonlyLongestMatch
- Add only the longest matching subword to the streampublic Lucene43HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, int minWordSize, int minSubwordSize, int maxSubwordSize)
public Lucene43HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator)
public static HyphenationTree getHyphenationTree(String hyphenationFilename) throws IOException
hyphenationFilename
- the filename of the XML grammar to loadIOException
- If there is a low-level I/O error.public static HyphenationTree getHyphenationTree(InputSource hyphenationSource) throws IOException
hyphenationSource
- the InputSource pointing to the XML grammarIOException
- If there is a low-level I/O error.protected void decompose()
Lucene43CompoundWordTokenFilterBase
Lucene43CompoundWordTokenFilterBase.termAtt
and places Lucene43CompoundWordTokenFilterBase.CompoundToken
instances in the Lucene43CompoundWordTokenFilterBase.tokens
list.
The original token may not be placed in the list, as it is automatically passed through this filter.decompose
in class Lucene43CompoundWordTokenFilterBase
Copyright © 2000-2015 Apache Software Foundation. All Rights Reserved.