HyphenationCompoundWordTokenFilter (Lucene 3.0.3 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.compound
Class HyphenationCompoundWordTokenFilter

java.lang.Object
  org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
          org.apache.lucene.analysis.TokenFilter
              org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
                  org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter

All Implemented Interfaces:: Closeable

public class HyphenationCompoundWordTokenFilter
extends CompoundWordTokenFilterBase
extends CompoundWordTokenFilterBase

A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
`org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State`

Field Summary

Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, onlyLongestMatch, tokens`

Fields inherited from class org.apache.lucene.analysis.TokenFilter
`input`

Constructor Summary
`HyphenationCompoundWordTokenFilter(org.apache.lucene.analysis.TokenStream input, HyphenationTree hyphenator, Set dictionary)`
`HyphenationCompoundWordTokenFilter(org.apache.lucene.analysis.TokenStream input, HyphenationTree hyphenator, Set dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)`
`HyphenationCompoundWordTokenFilter(org.apache.lucene.analysis.TokenStream input, HyphenationTree hyphenator, String[] dictionary)`
`HyphenationCompoundWordTokenFilter(org.apache.lucene.analysis.TokenStream input, HyphenationTree hyphenator, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)`

Method Summary
`protected void`	`decomposeInternal(org.apache.lucene.analysis.Token token)`
`static HyphenationTree`	`getHyphenationTree(File hyphenationFile)` Create a hyphenator tree
`static HyphenationTree`	`getHyphenationTree(InputSource hyphenationSource)` Create a hyphenator tree
`static HyphenationTree`	`getHyphenationTree(Reader hyphenationReader)` Create a hyphenator tree
`static HyphenationTree`	`getHyphenationTree(String hyphenationFilename)` Create a hyphenator tree

Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`addAllLowerCase, createToken, decompose, incrementToken, makeDictionary, makeLowerCaseCopy, reset`

Methods inherited from class org.apache.lucene.analysis.TokenFilter
`close, end`

Methods inherited from class org.apache.lucene.util.AttributeSource
`addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Constructor Detail

HyphenationCompoundWordTokenFilter

public HyphenationCompoundWordTokenFilter(org.apache.lucene.analysis.TokenStream input,
                                          HyphenationTree hyphenator,
                                          String[] dictionary,
                                          int minWordSize,
                                          int minSubwordSize,
                                          int maxSubwordSize,
                                          boolean onlyLongestMatch)

Parameters:: input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

HyphenationCompoundWordTokenFilter

public HyphenationCompoundWordTokenFilter(org.apache.lucene.analysis.TokenStream input,
                                          HyphenationTree hyphenator,
                                          String[] dictionary)

Parameters:: input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against

HyphenationCompoundWordTokenFilter

public HyphenationCompoundWordTokenFilter(org.apache.lucene.analysis.TokenStream input,
                                          HyphenationTree hyphenator,
                                          Set dictionary)

Parameters:: input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.

HyphenationCompoundWordTokenFilter

public HyphenationCompoundWordTokenFilter(org.apache.lucene.analysis.TokenStream input,
                                          HyphenationTree hyphenator,
                                          Set dictionary,
                                          int minWordSize,
                                          int minSubwordSize,
                                          int maxSubwordSize,
                                          boolean onlyLongestMatch)

Parameters:: input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

Method Detail

getHyphenationTree

public static HyphenationTree getHyphenationTree(String hyphenationFilename)
                                          throws Exception

Create a hyphenator tree

Parameters:: hyphenationFilename - the filename of the XML grammar to load
Returns:: An object representing the hyphenation patterns
Throws:: Exception

getHyphenationTree

public static HyphenationTree getHyphenationTree(File hyphenationFile)
                                          throws Exception

Create a hyphenator tree

Parameters:: hyphenationFile - the file of the XML grammar to load
Returns:: An object representing the hyphenation patterns
Throws:: Exception

getHyphenationTree

public static HyphenationTree getHyphenationTree(Reader hyphenationReader)
                                          throws Exception

Create a hyphenator tree

Parameters:: hyphenationReader - the reader of the XML grammar to load from
Returns:: An object representing the hyphenation patterns
Throws:: Exception

getHyphenationTree

public static HyphenationTree getHyphenationTree(InputSource hyphenationSource)
                                          throws Exception

Create a hyphenator tree

Parameters:: hyphenationSource - the InputSource pointing to the XML grammar
Returns:: An object representing the hyphenation patterns
Throws:: Exception

decomposeInternal

protected void decomposeInternal(org.apache.lucene.analysis.Token token)

Specified by:: decomposeInternal in class CompoundWordTokenFilterBase

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.compound Class HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

getHyphenationTree

getHyphenationTree

getHyphenationTree

getHyphenationTree

decomposeInternal

org.apache.lucene.analysis.compound
Class HyphenationCompoundWordTokenFilter