DictionaryCompoundWordTokenFilter (Lucene 2.9.4 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.compound
Class DictionaryCompoundWordTokenFilter

java.lang.Object
  org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
          org.apache.lucene.analysis.TokenFilter
              org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
                  org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter

public class DictionaryCompoundWordTokenFilter
extends CompoundWordTokenFilterBase
extends CompoundWordTokenFilterBase

A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
`AttributeSource.AttributeFactory, AttributeSource.State`

Field Summary

Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, onlyLongestMatch, tokens`

Fields inherited from class org.apache.lucene.analysis.TokenFilter
`input`

Constructor Summary
`DictionaryCompoundWordTokenFilter(TokenStream input, Set dictionary)`
`DictionaryCompoundWordTokenFilter(TokenStream input, Set dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)`
`DictionaryCompoundWordTokenFilter(TokenStream input, String[] dictionary)`
`DictionaryCompoundWordTokenFilter(TokenStream input, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)`

Method Summary
`protected void`	`decomposeInternal(Token token)`

Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`addAllLowerCase, createToken, decompose, incrementToken, makeDictionary, makeLowerCaseCopy, next, next, reset`

Methods inherited from class org.apache.lucene.analysis.TokenFilter
`close, end`

Methods inherited from class org.apache.lucene.analysis.TokenStream
`getOnlyUseNewAPI, setOnlyUseNewAPI`

Methods inherited from class org.apache.lucene.util.AttributeSource
`addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Constructor Detail

DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter(TokenStream input,
                                         String[] dictionary,
                                         int minWordSize,
                                         int minSubwordSize,
                                         int maxSubwordSize,
                                         boolean onlyLongestMatch)

Parameters:: input - the TokenStream to process; dictionary - the word dictionary to match against; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter(TokenStream input,
                                         String[] dictionary)

Parameters:: input - the TokenStream to process; dictionary - the word dictionary to match against

DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter(TokenStream input,
                                         Set dictionary)

Parameters:: input - the TokenStream to process; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.

DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter(TokenStream input,
                                         Set dictionary,
                                         int minWordSize,
                                         int minSubwordSize,
                                         int maxSubwordSize,
                                         boolean onlyLongestMatch)

Parameters:: input - the TokenStream to process; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

Method Detail

decomposeInternal

protected void decomposeInternal(Token token)

Specified by:: decomposeInternal in class CompoundWordTokenFilterBase

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.compound Class DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

decomposeInternal

org.apache.lucene.analysis.compound
Class DictionaryCompoundWordTokenFilter