org.apache.lucene.analysis
Class StopFilter

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.TokenFilter
              extended by org.apache.lucene.analysis.StopFilter
All Implemented Interfaces:
Closeable

public final class StopFilter
extends TokenFilter

Removes stop words from a token stream.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
 
Constructor Summary
StopFilter(boolean enablePositionIncrements, TokenStream in, Set<?> stopWords)
          Constructs a filter which removes words from the input TokenStream that are named in the Set.
StopFilter(boolean enablePositionIncrements, TokenStream input, Set<?> stopWords, boolean ignoreCase)
          Construct a token stream filtering the given input.
 
Method Summary
 boolean getEnablePositionIncrements()
           
static boolean getEnablePositionIncrementsVersionDefault(Version matchVersion)
          Returns version-dependent default for enablePositionIncrements.
 boolean incrementToken()
          Returns the next input Token whose term() is not a stop word.
static Set<Object> makeStopSet(List<?> stopWords)
          Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
static Set<Object> makeStopSet(List<?> stopWords, boolean ignoreCase)
           
static Set<Object> makeStopSet(String... stopWords)
          Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
static Set<Object> makeStopSet(String[] stopWords, boolean ignoreCase)
           
 void setEnablePositionIncrements(boolean enable)
          If true, this StopFilter will preserve positions of the incoming tokens (ie, accumulate and set position increments of the removed stop tokens).
 
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, reset
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

StopFilter

public StopFilter(boolean enablePositionIncrements,
                  TokenStream input,
                  Set<?> stopWords,
                  boolean ignoreCase)
Construct a token stream filtering the given input. If stopWords is an instance of CharArraySet (true if makeStopSet() was used to construct the set) it will be directly used and ignoreCase will be ignored since CharArraySet directly controls case sensitivity.

If stopWords is not an instance of CharArraySet, a new CharArraySet will be constructed and ignoreCase will be used to specify the case sensitivity of that set.

Parameters:
enablePositionIncrements - true if token positions should record the removed stop words
input - Input TokenStream
stopWords - A Set of Strings or char[] or any other toString()-able set representing the stopwords
ignoreCase - if true, all words are lower cased first

StopFilter

public StopFilter(boolean enablePositionIncrements,
                  TokenStream in,
                  Set<?> stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set.

Parameters:
enablePositionIncrements - true if token positions should record the removed stop words
in - Input stream
stopWords - A Set of Strings or char[] or any other toString()-able set representing the stopwords
See Also:
makeStopSet(java.lang.String[])
Method Detail

makeStopSet

public static final Set<Object> makeStopSet(String... stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

See Also:
passing false to ignoreCase

makeStopSet

public static final Set<Object> makeStopSet(List<?> stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

Parameters:
stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
Returns:
A Set (CharArraySet) containing the words
See Also:
passing false to ignoreCase

makeStopSet

public static final Set<Object> makeStopSet(String[] stopWords,
                                            boolean ignoreCase)
Parameters:
stopWords - An array of stopwords
ignoreCase - If true, all words are lower cased first.
Returns:
a Set containing the words

makeStopSet

public static final Set<Object> makeStopSet(List<?> stopWords,
                                            boolean ignoreCase)
Parameters:
stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
ignoreCase - if true, all words are lower cased first
Returns:
A Set (CharArraySet) containing the words

incrementToken

public final boolean incrementToken()
                             throws IOException
Returns the next input Token whose term() is not a stop word.

Specified by:
incrementToken in class TokenStream
Returns:
false for end of stream; true otherwise
Throws:
IOException

getEnablePositionIncrementsVersionDefault

public static boolean getEnablePositionIncrementsVersionDefault(Version matchVersion)
Returns version-dependent default for enablePositionIncrements. Analyzers that embed StopFilter use this method when creating the StopFilter. Prior to 2.9, this returns false. On 2.9 or later, it returns true.


getEnablePositionIncrements

public boolean getEnablePositionIncrements()
See Also:
setEnablePositionIncrements(boolean).

setEnablePositionIncrements

public void setEnablePositionIncrements(boolean enable)
If true, this StopFilter will preserve positions of the incoming tokens (ie, accumulate and set position increments of the removed stop tokens). Generally, true is best as it does not lose information (positions of the original tokens) during indexing.

When set, when a token is stopped (omitted), the position increment of the following token is incremented.

NOTE: be sure to also set QueryParser.setEnablePositionIncrements(boolean) if you use QueryParser to create queries.



Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.