|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter
public class DictionaryCompoundWordTokenFilter
A TokenFilter that decomposes compound words found in many Germanic languages.
"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
|---|
AttributeSource.AttributeFactory, AttributeSource.State |
| Field Summary |
|---|
| Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase |
|---|
DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, onlyLongestMatch, tokens |
| Fields inherited from class org.apache.lucene.analysis.TokenFilter |
|---|
input |
| Method Summary | |
|---|---|
protected void |
decomposeInternal(Token token)
|
| Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase |
|---|
addAllLowerCase, createToken, decompose, incrementToken, makeDictionary, makeDictionary, makeLowerCaseCopy, reset |
| Methods inherited from class org.apache.lucene.analysis.TokenFilter |
|---|
close, end |
| Methods inherited from class org.apache.lucene.util.AttributeSource |
|---|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
@Deprecated
public DictionaryCompoundWordTokenFilter(TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
DictionaryCompoundWordTokenFilter(Version, TokenStream, String[], int, int, int, boolean) instead
DictionaryCompoundWordTokenFilter
input - the TokenStream to processdictionary - the word dictionary to match againstminWordSize - only words longer than this get processedminSubwordSize - only subwords longer than this get to the output streammaxSubwordSize - only subwords shorter than this get to the output streamonlyLongestMatch - Add only the longest matching subword to the stream
@Deprecated
public DictionaryCompoundWordTokenFilter(TokenStream input,
String[] dictionary)
DictionaryCompoundWordTokenFilter(Version, TokenStream, String[]) instead
DictionaryCompoundWordTokenFilter
input - the TokenStream to processdictionary - the word dictionary to match against
@Deprecated
public DictionaryCompoundWordTokenFilter(TokenStream input,
Set dictionary)
DictionaryCompoundWordTokenFilter(Version, TokenStream, Set) instead
DictionaryCompoundWordTokenFilter
input - the TokenStream to processdictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain
lower case strings.
@Deprecated
public DictionaryCompoundWordTokenFilter(TokenStream input,
Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
DictionaryCompoundWordTokenFilter(Version, TokenStream, Set, int, int, int, boolean) instead
DictionaryCompoundWordTokenFilter
input - the TokenStream to processdictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain
lower case strings.minWordSize - only words longer than this get processedminSubwordSize - only subwords longer than this get to the output streammaxSubwordSize - only subwords shorter than this get to the output streamonlyLongestMatch - Add only the longest matching subword to the stream
public DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
DictionaryCompoundWordTokenFilter
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the
dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.input - the TokenStream to processdictionary - the word dictionary to match againstminWordSize - only words longer than this get processedminSubwordSize - only subwords longer than this get to the output streammaxSubwordSize - only subwords shorter than this get to the output streamonlyLongestMatch - Add only the longest matching subword to the stream
public DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
String[] dictionary)
DictionaryCompoundWordTokenFilter
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the
dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.input - the TokenStream to processdictionary - the word dictionary to match against
public DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
Set dictionary)
DictionaryCompoundWordTokenFilter
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the
dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.input - the TokenStream to processdictionary - the word dictionary to match against. If this is a
CharArraySet it
must have set ignoreCase=false and only contain lower case
strings.
public DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
DictionaryCompoundWordTokenFilter
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the
dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.input - the TokenStream to processdictionary - the word dictionary to match against. If this is a
CharArraySet it
must have set ignoreCase=false and only contain lower case
strings.minWordSize - only words longer than this get processedminSubwordSize - only subwords longer than this get to the output streammaxSubwordSize - only subwords shorter than this get to the output streamonlyLongestMatch - Add only the longest matching subword to the stream| Method Detail |
|---|
protected void decomposeInternal(Token token)
decomposeInternal in class CompoundWordTokenFilterBase
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||