|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.util.AttributeSource org.apache.lucene.analysis.TokenStream org.apache.lucene.analysis.TokenFilter org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter
public class DictionaryCompoundWordTokenFilter
A TokenFilter
that decomposes compound words found in many Germanic languages.
"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
---|
AttributeSource.AttributeFactory, AttributeSource.State |
Field Summary |
---|
Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase |
---|
DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, onlyLongestMatch, tokens |
Fields inherited from class org.apache.lucene.analysis.TokenFilter |
---|
input |
Method Summary | |
---|---|
protected void |
decomposeInternal(Token token)
|
Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase |
---|
addAllLowerCase, createToken, decompose, incrementToken, makeDictionary, makeDictionary, makeLowerCaseCopy, reset |
Methods inherited from class org.apache.lucene.analysis.TokenFilter |
---|
close, end |
Methods inherited from class org.apache.lucene.util.AttributeSource |
---|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
@Deprecated public DictionaryCompoundWordTokenFilter(TokenStream input, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
DictionaryCompoundWordTokenFilter(Version, TokenStream, String[], int, int, int, boolean)
instead
DictionaryCompoundWordTokenFilter
input
- the TokenStream
to processdictionary
- the word dictionary to match againstminWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output streammaxSubwordSize
- only subwords shorter than this get to the output streamonlyLongestMatch
- Add only the longest matching subword to the stream@Deprecated public DictionaryCompoundWordTokenFilter(TokenStream input, String[] dictionary)
DictionaryCompoundWordTokenFilter(Version, TokenStream, String[])
instead
DictionaryCompoundWordTokenFilter
input
- the TokenStream
to processdictionary
- the word dictionary to match against@Deprecated public DictionaryCompoundWordTokenFilter(TokenStream input, Set dictionary)
DictionaryCompoundWordTokenFilter(Version, TokenStream, Set)
instead
DictionaryCompoundWordTokenFilter
input
- the TokenStream
to processdictionary
- the word dictionary to match against. If this is a CharArraySet
it must have set ignoreCase=false and only contain
lower case strings.@Deprecated public DictionaryCompoundWordTokenFilter(TokenStream input, Set dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
DictionaryCompoundWordTokenFilter(Version, TokenStream, Set, int, int, int, boolean)
instead
DictionaryCompoundWordTokenFilter
input
- the TokenStream
to processdictionary
- the word dictionary to match against. If this is a CharArraySet
it must have set ignoreCase=false and only contain
lower case strings.minWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output streammaxSubwordSize
- only subwords shorter than this get to the output streamonlyLongestMatch
- Add only the longest matching subword to the streampublic DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
DictionaryCompoundWordTokenFilter
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the
dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.input
- the TokenStream
to processdictionary
- the word dictionary to match againstminWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output streammaxSubwordSize
- only subwords shorter than this get to the output streamonlyLongestMatch
- Add only the longest matching subword to the streampublic DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, String[] dictionary)
DictionaryCompoundWordTokenFilter
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the
dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.input
- the TokenStream
to processdictionary
- the word dictionary to match againstpublic DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, Set dictionary)
DictionaryCompoundWordTokenFilter
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the
dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.input
- the TokenStream
to processdictionary
- the word dictionary to match against. If this is a
CharArraySet
it
must have set ignoreCase=false and only contain lower case
strings.public DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, Set dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
DictionaryCompoundWordTokenFilter
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the
dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.input
- the TokenStream
to processdictionary
- the word dictionary to match against. If this is a
CharArraySet
it
must have set ignoreCase=false and only contain lower case
strings.minWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output streammaxSubwordSize
- only subwords shorter than this get to the output streamonlyLongestMatch
- Add only the longest matching subword to the streamMethod Detail |
---|
protected void decomposeInternal(Token token)
decomposeInternal
in class CompoundWordTokenFilterBase
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |