DictionaryCompoundWordTokenFilter (Lucene 4.10.2 API)

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.TokenFilter
    - - org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
      - org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter

All Implemented Interfaces:

Closeable, AutoCloseable
```
public class DictionaryCompoundWordTokenFilter
extends CompoundWordTokenFilterBase
```
A TokenFilter that decomposes compound words found in many Germanic languages.
"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.
You may specify the Version compatibility when creating CompoundWordTokenFilterBase:
- As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
  CompoundWordTokenFilterBase.CompoundToken
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
  DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, matchVersion, maxSubwordSize, minSubwordSize, minWordSize, offsetAtt, onlyLongestMatch, termAtt, tokens
- Fields inherited from class org.apache.lucene.analysis.TokenFilter
  input
- Fields inherited from class org.apache.lucene.analysis.TokenStream
  DEFAULT_TOKEN_ATTRIBUTE_FACTORY
- Fields inherited from class org.apache.lucene.util.AttributeSource
  DEFAULT_ATTRIBUTE_FACTORY

Constructor Summary

Constructors
Constructor and Description
`DictionaryCompoundWordTokenFilter(TokenStream input, CharArraySet dictionary)` Creates a new `DictionaryCompoundWordTokenFilter`
`DictionaryCompoundWordTokenFilter(TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Creates a new `DictionaryCompoundWordTokenFilter`
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, CharArraySet dictionary)` Deprecated. Use `DictionaryCompoundWordTokenFilter(TokenStream,CharArraySet)`
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Deprecated. Use `DictionaryCompoundWordTokenFilter(TokenStream,CharArraySet,int,int,int,boolean)`

Method Summary

Methods
Modifier and Type	Method and Description
`protected void`	`decompose()` Decomposes the current `CompoundWordTokenFilterBase.termAtt` and places `CompoundWordTokenFilterBase.CompoundToken` instances in the `CompoundWordTokenFilterBase.tokens` list.

Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
incrementToken, reset

Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end

Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Detail

DictionaryCompoundWordTokenFilter
```
public DictionaryCompoundWordTokenFilter(TokenStream input,
                                 CharArraySet dictionary)
```
Creates a new DictionaryCompoundWordTokenFilter

Parameters:
input - the TokenStream to process
dictionary - the word dictionary to match against.

DictionaryCompoundWordTokenFilter

@Deprecated
public DictionaryCompoundWordTokenFilter(Version matchVersion,
                                            TokenStream input,
                                            CharArraySet dictionary)

Deprecated. Use DictionaryCompoundWordTokenFilter(TokenStream,CharArraySet)

DictionaryCompoundWordTokenFilter
```
public DictionaryCompoundWordTokenFilter(TokenStream input,
                                 CharArraySet dictionary,
                                 int minWordSize,
                                 int minSubwordSize,
                                 int maxSubwordSize,
                                 boolean onlyLongestMatch)
```
Creates a new DictionaryCompoundWordTokenFilter

Parameters:
input - the TokenStream to process
dictionary - the word dictionary to match against.
minWordSize - only words longer than this get processed
minSubwordSize - only subwords longer than this get to the output stream
maxSubwordSize - only subwords shorter than this get to the output stream
onlyLongestMatch - Add only the longest matching subword to the stream

DictionaryCompoundWordTokenFilter

@Deprecated
public DictionaryCompoundWordTokenFilter(Version matchVersion,
                                            TokenStream input,
                                            CharArraySet dictionary,
                                            int minWordSize,
                                            int minSubwordSize,
                                            int maxSubwordSize,
                                            boolean onlyLongestMatch)

Deprecated. Use DictionaryCompoundWordTokenFilter(TokenStream,CharArraySet,int,int,int,boolean)

Method Detail
- decompose
```
protected void decompose()
```
  Description copied from class: CompoundWordTokenFilterBase
  
  Decomposes the current CompoundWordTokenFilterBase.termAtt and places CompoundWordTokenFilterBase.CompoundToken instances in the CompoundWordTokenFilterBase.tokens list. The original token may not be placed in the list, as it is automatically passed through this filter.
  
  Specified by:
  
  decompose in class CompoundWordTokenFilterBase

Class DictionaryCompoundWordTokenFilter

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

Field Summary

Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase

Fields inherited from class org.apache.lucene.analysis.TokenFilter

Fields inherited from class org.apache.lucene.analysis.TokenStream

Fields inherited from class org.apache.lucene.util.AttributeSource

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase

Methods inherited from class org.apache.lucene.analysis.TokenFilter

Methods inherited from class org.apache.lucene.util.AttributeSource

Methods inherited from class java.lang.Object

Constructor Detail

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

Method Detail

decompose