HyphenationCompoundWordTokenFilter (Lucene 3.4.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.compound
Class HyphenationCompoundWordTokenFilter

java.lang.Object
  org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
          org.apache.lucene.analysis.TokenFilter
              org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
                  org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter

All Implemented Interfaces:: Closeable

public class HyphenationCompoundWordTokenFilter
extends CompoundWordTokenFilterBase
extends CompoundWordTokenFilterBase

A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
`AttributeSource.AttributeFactory, AttributeSource.State`

Field Summary

Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, onlyLongestMatch, tokens`

Fields inherited from class org.apache.lucene.analysis.TokenFilter
`input`

Constructor Summary
`HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, Set<?> dictionary)` Deprecated. use `HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, Set)` instead.
`HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, Set<?> dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Deprecated. use `HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, Set, int, int, int, boolean)` instead.
`HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, String[] dictionary)` Deprecated. use `HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, String[])` instead.
`HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Deprecated. use `HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, String[], int, int, int, boolean)` instead.
`HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator)` Create a HyphenationCompoundWordTokenFilter with no dictionary.
`HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator, int minWordSize, int minSubwordSize, int maxSubwordSize)` Create a HyphenationCompoundWordTokenFilter with no dictionary.
`HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator, Set<?> dictionary)` Creates a new `HyphenationCompoundWordTokenFilter` instance.
`HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator, Set<?> dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Creates a new `HyphenationCompoundWordTokenFilter` instance.
`HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator, String[] dictionary)` Creates a new `HyphenationCompoundWordTokenFilter` instance.
`HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Creates a new `HyphenationCompoundWordTokenFilter` instance.

Method Summary
`protected void`	`decomposeInternal(Token token)`
`static HyphenationTree`	`getHyphenationTree(File hyphenationFile)` Create a hyphenator tree
`static HyphenationTree`	`getHyphenationTree(InputSource hyphenationSource)` Create a hyphenator tree
`static HyphenationTree`	`getHyphenationTree(Reader hyphenationReader)` Deprecated. Don't use Readers with fixed charset to load XML files, unless programatically created. Use `getHyphenationTree(InputSource)` instead, where you can supply default charset and input stream, if you like.
`static HyphenationTree`	`getHyphenationTree(String hyphenationFilename)` Create a hyphenator tree

Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`addAllLowerCase, createToken, decompose, incrementToken, makeDictionary, makeDictionary, makeLowerCaseCopy, reset`

Methods inherited from class org.apache.lucene.analysis.TokenFilter
`close, end`

Methods inherited from class org.apache.lucene.util.AttributeSource
`addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Constructor Detail

HyphenationCompoundWordTokenFilter

public HyphenationCompoundWordTokenFilter(Version matchVersion,
                                          TokenStream input,
                                          HyphenationTree hyphenator,
                                          String[] dictionary,
                                          int minWordSize,
                                          int minSubwordSize,
                                          int maxSubwordSize,
                                          boolean onlyLongestMatch)

Creates a new HyphenationCompoundWordTokenFilter instance.

Parameters:: matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.; input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

HyphenationCompoundWordTokenFilter

public HyphenationCompoundWordTokenFilter(Version matchVersion,
                                          TokenStream input,
                                          HyphenationTree hyphenator,
                                          String[] dictionary)

Creates a new HyphenationCompoundWordTokenFilter instance.

Parameters:: matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.; input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against

HyphenationCompoundWordTokenFilter

public HyphenationCompoundWordTokenFilter(Version matchVersion,
                                          TokenStream input,
                                          HyphenationTree hyphenator,
                                          Set<?> dictionary)

Creates a new HyphenationCompoundWordTokenFilter instance.

Parameters:: matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.; input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.

HyphenationCompoundWordTokenFilter

public HyphenationCompoundWordTokenFilter(Version matchVersion,
                                          TokenStream input,
                                          HyphenationTree hyphenator,
                                          Set<?> dictionary,
                                          int minWordSize,
                                          int minSubwordSize,
                                          int maxSubwordSize,
                                          boolean onlyLongestMatch)

Creates a new HyphenationCompoundWordTokenFilter instance.

Parameters:: matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the dictionaries if Version > 3.0. See CompoundWordTokenFilterBase for details.; input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

HyphenationCompoundWordTokenFilter

public HyphenationCompoundWordTokenFilter(Version matchVersion,
                                          TokenStream input,
                                          HyphenationTree hyphenator,
                                          int minWordSize,
                                          int minSubwordSize,
                                          int maxSubwordSize)

Create a HyphenationCompoundWordTokenFilter with no dictionary.

Calls HyphenationCompoundWordTokenFilter(matchVersion, input, hyphenator, null, minWordSize, minSubwordSize, maxSubwordSize

HyphenationCompoundWordTokenFilter

public HyphenationCompoundWordTokenFilter(Version matchVersion,
                                          TokenStream input,
                                          HyphenationTree hyphenator)

Create a HyphenationCompoundWordTokenFilter with no dictionary.

Calls HyphenationCompoundWordTokenFilter(matchVersion, input, hyphenator, DEFAULT_MIN_WORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MAX_SUBWORD_SIZE

HyphenationCompoundWordTokenFilter

@Deprecated
public HyphenationCompoundWordTokenFilter(TokenStream input,
                                                     HyphenationTree hyphenator,
                                                     String[] dictionary,
                                                     int minWordSize,
                                                     int minSubwordSize,
                                                     int maxSubwordSize,
                                                     boolean onlyLongestMatch)

Deprecated. use HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, String[], int, int, int, boolean) instead.

Creates a new HyphenationCompoundWordTokenFilter instance.

Parameters:: input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

HyphenationCompoundWordTokenFilter

@Deprecated
public HyphenationCompoundWordTokenFilter(TokenStream input,
                                                     HyphenationTree hyphenator,
                                                     String[] dictionary)

Deprecated. use HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, String[]) instead.

Creates a new HyphenationCompoundWordTokenFilter instance.

Parameters:: input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against

HyphenationCompoundWordTokenFilter

@Deprecated
public HyphenationCompoundWordTokenFilter(TokenStream input,
                                                     HyphenationTree hyphenator,
                                                     Set<?> dictionary)

Deprecated. use HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, Set) instead.

Creates a new HyphenationCompoundWordTokenFilter instance.

Parameters:: input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.

HyphenationCompoundWordTokenFilter

@Deprecated
public HyphenationCompoundWordTokenFilter(TokenStream input,
                                                     HyphenationTree hyphenator,
                                                     Set<?> dictionary,
                                                     int minWordSize,
                                                     int minSubwordSize,
                                                     int maxSubwordSize,
                                                     boolean onlyLongestMatch)

Deprecated. use HyphenationCompoundWordTokenFilter(Version, TokenStream, HyphenationTree, Set, int, int, int, boolean) instead.

Creates a new HyphenationCompoundWordTokenFilter instance.

Parameters:: input - the TokenStream to process; hyphenator - the hyphenation pattern tree to use for hyphenation; dictionary - the word dictionary to match against. If this is a CharArraySet it must have set ignoreCase=false and only contain lower case strings.; minWordSize - only words longer than this get processed; minSubwordSize - only subwords longer than this get to the output stream; maxSubwordSize - only subwords shorter than this get to the output stream; onlyLongestMatch - Add only the longest matching subword to the stream

Method Detail

getHyphenationTree

public static HyphenationTree getHyphenationTree(String hyphenationFilename)
                                          throws Exception

Create a hyphenator tree

Parameters:: hyphenationFilename - the filename of the XML grammar to load
Returns:: An object representing the hyphenation patterns
Throws:: Exception

getHyphenationTree

public static HyphenationTree getHyphenationTree(File hyphenationFile)
                                          throws Exception

Create a hyphenator tree

Parameters:: hyphenationFile - the file of the XML grammar to load
Returns:: An object representing the hyphenation patterns
Throws:: Exception

getHyphenationTree

@Deprecated
public static HyphenationTree getHyphenationTree(Reader hyphenationReader)
                                          throws Exception

Deprecated. Don't use Readers with fixed charset to load XML files, unless programatically created. Use getHyphenationTree(InputSource) instead, where you can supply default charset and input stream, if you like.

Create a hyphenator tree

Parameters:: hyphenationReader - the reader of the XML grammar to load from
Returns:: An object representing the hyphenation patterns
Throws:: Exception

getHyphenationTree

public static HyphenationTree getHyphenationTree(InputSource hyphenationSource)
                                          throws Exception

Create a hyphenator tree

Parameters:: hyphenationSource - the InputSource pointing to the XML grammar
Returns:: An object representing the hyphenation patterns
Throws:: Exception

decomposeInternal

protected void decomposeInternal(Token token)

Specified by:: decomposeInternal in class CompoundWordTokenFilterBase

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.compound Class HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

HyphenationCompoundWordTokenFilter

getHyphenationTree

getHyphenationTree

getHyphenationTree

getHyphenationTree

decomposeInternal

org.apache.lucene.analysis.compound
Class HyphenationCompoundWordTokenFilter