Class DictionaryCompoundWordTokenFilter
- java.lang.Object
- 
- org.apache.lucene.util.AttributeSource
- 
- org.apache.lucene.analysis.TokenStream
- 
- org.apache.lucene.analysis.TokenFilter
- 
- org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
- 
- org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter
 
 
 
 
 
- 
- All Implemented Interfaces:
- Closeable,- AutoCloseable,- Unwrappable<TokenStream>
 
 public class DictionaryCompoundWordTokenFilter extends CompoundWordTokenFilterBase ATokenFilterthat decomposes compound words found in many Germanic languages."Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this. 
- 
- 
Nested Class Summary- 
Nested classes/interfaces inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBaseCompoundWordTokenFilterBase.CompoundToken
 - 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSourceAttributeSource.State
 
- 
 - 
Field Summary- 
Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBaseDEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, offsetAtt, onlyLongestMatch, termAtt, tokens
 - 
Fields inherited from class org.apache.lucene.analysis.TokenFilterinput
 - 
Fields inherited from class org.apache.lucene.analysis.TokenStreamDEFAULT_TOKEN_ATTRIBUTE_FACTORY
 
- 
 - 
Constructor SummaryConstructors Constructor Description DictionaryCompoundWordTokenFilter(TokenStream input, CharArraySet dictionary)Creates a newDictionaryCompoundWordTokenFilterDictionaryCompoundWordTokenFilter(TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)Creates a newDictionaryCompoundWordTokenFilter
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voiddecompose()Decomposes the currentCompoundWordTokenFilterBase.termAttand placesCompoundWordTokenFilterBase.CompoundTokeninstances in theCompoundWordTokenFilterBase.tokenslist.- 
Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBaseincrementToken, reset
 - 
Methods inherited from class org.apache.lucene.analysis.TokenFilterclose, end, unwrap
 - 
Methods inherited from class org.apache.lucene.util.AttributeSourceaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
 
- 
 
- 
- 
- 
Constructor Detail- 
DictionaryCompoundWordTokenFilterpublic DictionaryCompoundWordTokenFilter(TokenStream input, CharArraySet dictionary) Creates a newDictionaryCompoundWordTokenFilter- Parameters:
- input- the- TokenStreamto process
- dictionary- the word dictionary to match against.
 
 - 
DictionaryCompoundWordTokenFilterpublic DictionaryCompoundWordTokenFilter(TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch) Creates a newDictionaryCompoundWordTokenFilter- Parameters:
- input- the- TokenStreamto process
- dictionary- the word dictionary to match against.
- minWordSize- only words longer than this get processed
- minSubwordSize- only subwords longer than this get to the output stream
- maxSubwordSize- only subwords shorter than this get to the output stream
- onlyLongestMatch- Add only the longest matching subword to the stream
 
 
- 
 - 
Method Detail- 
decomposeprotected void decompose() Description copied from class:CompoundWordTokenFilterBaseDecomposes the currentCompoundWordTokenFilterBase.termAttand placesCompoundWordTokenFilterBase.CompoundTokeninstances in theCompoundWordTokenFilterBase.tokenslist. The original token may not be placed in the list, as it is automatically passed through this filter.- Specified by:
- decomposein class- CompoundWordTokenFilterBase
 
 
- 
 
-