org.apache.lucene.analysis.compound
Class DictionaryCompoundWordTokenFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter
public class DictionaryCompoundWordTokenFilter
- extends CompoundWordTokenFilterBase
A TokenFilter
that decomposes compound words found in many Germanic languages.
"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find
"Donaudampfschiff" even when you only enter "schiff".
It uses a brute-force algorithm to achieve this.
Constructor Summary |
DictionaryCompoundWordTokenFilter(TokenStream input,
Set dictionary)
|
DictionaryCompoundWordTokenFilter(TokenStream input,
Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
DictionaryCompoundWordTokenFilter(TokenStream input,
String[] dictionary)
|
DictionaryCompoundWordTokenFilter(TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString |
DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter(TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
- Parameters:
input
- the TokenStream
to processdictionary
- the word dictionary to match againstminWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output streammaxSubwordSize
- only subwords shorter than this get to the output streamonlyLongestMatch
- Add only the longest matching subword to the stream
DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter(TokenStream input,
String[] dictionary)
- Parameters:
input
- the TokenStream
to processdictionary
- the word dictionary to match against
DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter(TokenStream input,
Set dictionary)
- Parameters:
input
- the TokenStream
to processdictionary
- the word dictionary to match against. If this is a CharArraySet
it must have set ignoreCase=false and only contain
lower case strings.
DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter(TokenStream input,
Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
- Parameters:
input
- the TokenStream
to processdictionary
- the word dictionary to match against. If this is a CharArraySet
it must have set ignoreCase=false and only contain
lower case strings.minWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output streammaxSubwordSize
- only subwords shorter than this get to the output streamonlyLongestMatch
- Add only the longest matching subword to the stream
decomposeInternal
protected void decomposeInternal(Token token)
- Specified by:
decomposeInternal
in class CompoundWordTokenFilterBase
Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.