DictionaryCompoundWordTokenFilter (Lucene 4.7.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.compound
Class DictionaryCompoundWordTokenFilter

java.lang.Object
  org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
          org.apache.lucene.analysis.TokenFilter
              org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
                  org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter

All Implemented Interfaces:: Closeable

public class DictionaryCompoundWordTokenFilter
extends CompoundWordTokenFilterBase
extends CompoundWordTokenFilterBase

A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

You must specify the required Version compatibility when creating CompoundWordTokenFilterBase:

As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`CompoundWordTokenFilterBase.CompoundToken`

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
`AttributeSource.AttributeFactory, AttributeSource.State`

Field Summary

Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, matchVersion, maxSubwordSize, minSubwordSize, minWordSize, offsetAtt, onlyLongestMatch, termAtt, tokens`

Fields inherited from class org.apache.lucene.analysis.TokenFilter
`input`

Constructor Summary
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, CharArraySet dictionary)` Creates a new `DictionaryCompoundWordTokenFilter`
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Creates a new `DictionaryCompoundWordTokenFilter`

Method Summary
`protected void`	`decompose()` Decomposes the current `CompoundWordTokenFilterBase.termAtt` and places `CompoundWordTokenFilterBase.CompoundToken` instances in the `CompoundWordTokenFilterBase.tokens` list.

Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`incrementToken, reset`

Methods inherited from class org.apache.lucene.analysis.TokenFilter
`close, end`

Methods inherited from class org.apache.lucene.util.AttributeSource
`addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Constructor Detail

DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter(Version matchVersion,
                                         TokenStream input,
                                         CharArraySet dictionary)

Creates a new DictionaryCompoundWordTokenFilter

Parameters:: matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.; input - the TokenStream to process; dictionary - the word dictionary to match against.

DictionaryCompoundWordTokenFilter

public DictionaryCompoundWordTokenFilter(Version matchVersion,
                                         TokenStream input,
                                         CharArraySet dictionary,
                                         int minWordSize,
                                         int minSubwordSize,
                                         int maxSubwordSize,
                                         boolean onlyLongestMatch)

Creates a new DictionaryCompoundWordTokenFilter

Parameters:: matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.; input - the TokenStream to process; dictionary - the word dictionary to match against.; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

Method Detail

decompose

protected void decompose()

Description copied from class: CompoundWordTokenFilterBase

Decomposes the current CompoundWordTokenFilterBase.termAtt and places CompoundWordTokenFilterBase.CompoundToken instances in the CompoundWordTokenFilterBase.tokens list. The original token may not be placed in the list, as it is automatically passed through this filter.

Specified by:: decompose in class CompoundWordTokenFilterBase

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.compound Class DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

DictionaryCompoundWordTokenFilter

decompose

org.apache.lucene.analysis.compound
Class DictionaryCompoundWordTokenFilter