DictionaryCompoundWordTokenFilter (Lucene 3.4.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.compound
Class DictionaryCompoundWordTokenFilter

java.lang.Object
  org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
          org.apache.lucene.analysis.TokenFilter
              org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
                  org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter

All Implemented Interfaces:: Closeable

public class DictionaryCompoundWordTokenFilter
extends CompoundWordTokenFilterBase
extends CompoundWordTokenFilterBase

A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
`AttributeSource.AttributeFactory, AttributeSource.State`

Field Summary

Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, onlyLongestMatch, tokens`

Fields inherited from class org.apache.lucene.analysis.TokenFilter
`input`

Constructor Summary
`DictionaryCompoundWordTokenFilter(TokenStream input, Set dictionary)` Deprecated. use `DictionaryCompoundWordTokenFilter(Version, TokenStream, Set)` instead
`DictionaryCompoundWordTokenFilter(TokenStream input, Set dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Deprecated. use `DictionaryCompoundWordTokenFilter(Version, TokenStream, Set, int, int, int, boolean)` instead
`DictionaryCompoundWordTokenFilter(TokenStream input, String[] dictionary)` Deprecated. use `DictionaryCompoundWordTokenFilter(Version, TokenStream, String[])` instead
`DictionaryCompoundWordTokenFilter(TokenStream input, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Deprecated. use `DictionaryCompoundWordTokenFilter(Version, TokenStream, String[], int, int, int, boolean)` instead
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, Set dictionary)` Creates a new `DictionaryCompoundWordTokenFilter`
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, Set dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Creates a new `DictionaryCompoundWordTokenFilter`
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, String[] dictionary)` Creates a new `DictionaryCompoundWordTokenFilter`
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Creates a new `DictionaryCompoundWordTokenFilter`

Method Summary
`protected void`	`decomposeInternal(Token token)`

Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`addAllLowerCase, createToken, decompose, incrementToken, makeDictionary, makeDictionary, makeLowerCaseCopy, reset`

Methods inherited from class org.apache.lucene.analysis.TokenFilter
`close, end`

Methods inherited from class org.apache.lucene.util.AttributeSource
`addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Constructor Detail

DictionaryCompoundWordTokenFilter

@Deprecated
public DictionaryCompoundWordTokenFilter(TokenStream input,
                                                    String[] dictionary,
                                                    int minWordSize,
                                                    int minSubwordSize,
                                                    int maxSubwordSize,
                                                    boolean onlyLongestMatch)

Deprecated. use DictionaryCompoundWordTokenFilter(Version, TokenStream, String[], int, int, int, boolean) instead

Creates a new DictionaryCompoundWordTokenFilter

Parameters:: input - the TokenStream to process; dictionary - the word dictionary to match against; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

DictionaryCompoundWordTokenFilter

@Deprecated
public DictionaryCompoundWordTokenFilter(TokenStream input,
                                                    String[] dictionary)

Deprecated. use DictionaryCompoundWordTokenFilter(Version, TokenStream, String[]) instead

Creates a new DictionaryCompoundWordTokenFilter

Parameters:: input - the TokenStream to process; dictionary - the word dictionary to match against

DictionaryCompoundWordTokenFilter

@Deprecated
public DictionaryCompoundWordTokenFilter(TokenStream input,
                                                    Set dictionary)

Deprecated. use DictionaryCompoundWordTokenFilter(Version, TokenStream, Set) instead

Creates a new DictionaryCompoundWordTokenFilter

Parameters:: input - the TokenStream to process; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.

DictionaryCompoundWordTokenFilter

@Deprecated
public DictionaryCompoundWordTokenFilter(TokenStream input,
                                                    Set dictionary,
                                                    int minWordSize,
                                                    int minSubwordSize,
                                                    int maxSubwordSize,
                                                    boolean onlyLongestMatch)

Deprecated. use DictionaryCompoundWordTokenFilter(Version, TokenStream, Set, int, int, int, boolean) instead

Creates a new DictionaryCompoundWordTokenFilter

Parameters:: input - the TokenStream to process; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter(Version matchVersion,
                                         TokenStream input,
                                         String[] dictionary,
                                         int minWordSize,
                                         int minSubwordSize,
                                         int maxSubwordSize,
                                         boolean onlyLongestMatch)

Creates a new DictionaryCompoundWordTokenFilter

Parameters:: matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.; input - the TokenStream to process; dictionary - the word dictionary to match against; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter(Version matchVersion,
                                         TokenStream input,
                                         String[] dictionary)

Creates a new DictionaryCompoundWordTokenFilter

Parameters:: matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.; input - the TokenStream to process; dictionary - the word dictionary to match against

DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter(Version matchVersion,
                                         TokenStream input,
                                         Set dictionary)

Creates a new DictionaryCompoundWordTokenFilter

Parameters:: matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.; input - the TokenStream to process; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.

DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter(Version matchVersion,
                                         TokenStream input,
                                         Set dictionary,
                                         int minWordSize,
                                         int minSubwordSize,
                                         int maxSubwordSize,
                                         boolean onlyLongestMatch)

Creates a new DictionaryCompoundWordTokenFilter

Parameters:: matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.; input - the TokenStream to process; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

Method Detail

decomposeInternal

protected void decomposeInternal(Token token)

Specified by:: decomposeInternal in class CompoundWordTokenFilterBase

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.compound Class DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

decomposeInternal

org.apache.lucene.analysis.compound
Class DictionaryCompoundWordTokenFilter