org.apache.lucene.analysis.hunspell
Class HunspellStemFilter

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.TokenFilter
              extended by org.apache.lucene.analysis.hunspell.HunspellStemFilter
All Implemented Interfaces:
Closeable

public final class HunspellStemFilter
extends TokenFilter

TokenFilter that uses hunspell affix rules and words to stem tokens. Since hunspell supports a word having multiple stems, this filter can emit multiple tokens for each consumed token

Note: This filter is aware of the KeywordAttribute. To prevent certain terms from being passed to the stemmer KeywordAttribute.isKeyword() should be set to true in a previous TokenStream. Note: For including the original term as well as the stemmed version, see KeywordRepeatFilterFactory


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
 
Constructor Summary
HunspellStemFilter(TokenStream input, HunspellDictionary dictionary)
          Create a HunspellStemFilter which deduplicates stems and has a maximum recursion level of 2.
HunspellStemFilter(TokenStream input, HunspellDictionary dictionary, boolean dedup)
          Create a HunspellStemFilter which has a maximum recursion level of 2.
HunspellStemFilter(TokenStream input, HunspellDictionary dictionary, boolean dedup, int recursionCap)
          Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided HunspellDictionary
HunspellStemFilter(TokenStream input, HunspellDictionary dictionary, int recursionCap)
          Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided HunspellDictionary
 
Method Summary
 boolean incrementToken()
          
 void reset()
          
 
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HunspellStemFilter

public HunspellStemFilter(TokenStream input,
                          HunspellDictionary dictionary)
Create a HunspellStemFilter which deduplicates stems and has a maximum recursion level of 2.

See Also:
HunspellStemFilter(TokenStream, HunspellDictionary, int)

HunspellStemFilter

public HunspellStemFilter(TokenStream input,
                          HunspellDictionary dictionary,
                          int recursionCap)
Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided HunspellDictionary

Parameters:
input - TokenStream whose tokens will be stemmed
dictionary - HunspellDictionary containing the affix rules and words that will be used to stem the tokens
recursionCap - maximum level of recursion stemmer can go into, defaults to 2

HunspellStemFilter

public HunspellStemFilter(TokenStream input,
                          HunspellDictionary dictionary,
                          boolean dedup)
Create a HunspellStemFilter which has a maximum recursion level of 2.

See Also:
HunspellStemFilter(TokenStream, HunspellDictionary, boolean, int)

HunspellStemFilter

public HunspellStemFilter(TokenStream input,
                          HunspellDictionary dictionary,
                          boolean dedup,
                          int recursionCap)
Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided HunspellDictionary

Parameters:
input - TokenStream whose tokens will be stemmed
dictionary - HunspellDictionary containing the affix rules and words that will be used to stem the tokens
dedup - true if only unique terms should be output.
recursionCap - maximum level of recursion stemmer can go into, defaults to 2
Method Detail

incrementToken

public boolean incrementToken()
                       throws IOException

Specified by:
incrementToken in class TokenStream
Throws:
IOException

reset

public void reset()
           throws IOException

Overrides:
reset in class TokenFilter
Throws:
IOException


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.