org.apache.lucene.analysis
Class StopFilter

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.TokenFilter
              extended by org.apache.lucene.analysis.StopFilter

public final class StopFilter
extends TokenFilter

Removes stop words from a token stream.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
 
Constructor Summary
StopFilter(boolean enablePositionIncrements, TokenStream in, Set stopWords)
          Constructs a filter which removes words from the input TokenStream that are named in the Set.
StopFilter(boolean enablePositionIncrements, TokenStream input, Set stopWords, boolean ignoreCase)
          Construct a token stream filtering the given input.
StopFilter(boolean enablePositionIncrements, TokenStream input, String[] stopWords)
          Deprecated. Use StopFilter(boolean, TokenStream, Set) instead.
StopFilter(boolean enablePositionIncrements, TokenStream in, String[] stopWords, boolean ignoreCase)
          Deprecated. Use StopFilter(boolean, TokenStream, Set, boolean) instead.
StopFilter(TokenStream in, Set stopWords)
          Deprecated. Use StopFilter(boolean, TokenStream, Set) instead
StopFilter(TokenStream input, Set stopWords, boolean ignoreCase)
          Deprecated. Use StopFilter(boolean, TokenStream, Set, boolean) instead
StopFilter(TokenStream input, String[] stopWords)
          Deprecated. Use StopFilter(boolean, TokenStream, String[]) instead
StopFilter(TokenStream in, String[] stopWords, boolean ignoreCase)
          Deprecated. Use StopFilter(boolean, TokenStream, String[], boolean) instead
 
Method Summary
 boolean getEnablePositionIncrements()
           
static boolean getEnablePositionIncrementsDefault()
          Deprecated. Please specify this when you create the StopFilter
static boolean getEnablePositionIncrementsVersionDefault(Version matchVersion)
          Returns version-dependent default for enablePositionIncrements.
 boolean incrementToken()
          Returns the next input Token whose term() is not a stop word.
 void init()
           
static Set makeStopSet(List stopWords)
          Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
static Set makeStopSet(List stopWords, boolean ignoreCase)
           
static Set makeStopSet(String[] stopWords)
          Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
static Set makeStopSet(String[] stopWords, boolean ignoreCase)
           
 void setEnablePositionIncrements(boolean enable)
          If true, this StopFilter will preserve positions of the incoming tokens (ie, accumulate and set position increments of the removed stop tokens).
static void setEnablePositionIncrementsDefault(boolean defaultValue)
          Deprecated. Please specify this when you create the StopFilter
 
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, reset
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
getOnlyUseNewAPI, next, next, setOnlyUseNewAPI
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

StopFilter

public StopFilter(TokenStream input,
                  String[] stopWords)
Deprecated. Use StopFilter(boolean, TokenStream, String[]) instead

Construct a token stream filtering the given input.


StopFilter

public StopFilter(boolean enablePositionIncrements,
                  TokenStream input,
                  String[] stopWords)
Deprecated. Use StopFilter(boolean, TokenStream, Set) instead.

Construct a token stream filtering the given input.

Parameters:
enablePositionIncrements - true if token positions should record the removed stop words
input - input TokenStream
stopWords - array of stop words

StopFilter

public StopFilter(TokenStream in,
                  String[] stopWords,
                  boolean ignoreCase)
Deprecated. Use StopFilter(boolean, TokenStream, String[], boolean) instead

Constructs a filter which removes words from the input TokenStream that are named in the array of words.


StopFilter

public StopFilter(boolean enablePositionIncrements,
                  TokenStream in,
                  String[] stopWords,
                  boolean ignoreCase)
Deprecated. Use StopFilter(boolean, TokenStream, Set, boolean) instead.

Constructs a filter which removes words from the input TokenStream that are named in the array of words.

Parameters:
enablePositionIncrements - true if token positions should record the removed stop words
in - input TokenStream
stopWords - array of stop words
ignoreCase - true if case is ignored

StopFilter

public StopFilter(TokenStream input,
                  Set stopWords,
                  boolean ignoreCase)
Deprecated. Use StopFilter(boolean, TokenStream, Set, boolean) instead

Construct a token stream filtering the given input. If stopWords is an instance of CharArraySet (true if makeStopSet() was used to construct the set) it will be directly used and ignoreCase will be ignored since CharArraySet directly controls case sensitivity.

If stopWords is not an instance of CharArraySet, a new CharArraySet will be constructed and ignoreCase will be used to specify the case sensitivity of that set.

Parameters:
input -
stopWords - The set of Stop Words.
ignoreCase - -Ignore case when stopping.

StopFilter

public StopFilter(boolean enablePositionIncrements,
                  TokenStream input,
                  Set stopWords,
                  boolean ignoreCase)
Construct a token stream filtering the given input. If stopWords is an instance of CharArraySet (true if makeStopSet() was used to construct the set) it will be directly used and ignoreCase will be ignored since CharArraySet directly controls case sensitivity.

If stopWords is not an instance of CharArraySet, a new CharArraySet will be constructed and ignoreCase will be used to specify the case sensitivity of that set.

Parameters:
enablePositionIncrements - true if token positions should record the removed stop words
input - Input TokenStream
stopWords - The set of Stop Words.
ignoreCase - -Ignore case when stopping.

StopFilter

public StopFilter(TokenStream in,
                  Set stopWords)
Deprecated. Use StopFilter(boolean, TokenStream, Set) instead

Constructs a filter which removes words from the input TokenStream that are named in the Set.

See Also:
makeStopSet(java.lang.String[])

StopFilter

public StopFilter(boolean enablePositionIncrements,
                  TokenStream in,
                  Set stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set.

Parameters:
enablePositionIncrements - true if token positions should record the removed stop words
in - Input stream
stopWords - The set of Stop Words.
See Also:
makeStopSet(java.lang.String[])
Method Detail

init

public void init()

makeStopSet

public static final Set makeStopSet(String[] stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

See Also:
passing false to ignoreCase

makeStopSet

public static final Set makeStopSet(List stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

See Also:
passing false to ignoreCase

makeStopSet

public static final Set makeStopSet(String[] stopWords,
                                    boolean ignoreCase)
Parameters:
stopWords - An array of stopwords
ignoreCase - If true, all words are lower cased first.
Returns:
a Set containing the words

makeStopSet

public static final Set makeStopSet(List stopWords,
                                    boolean ignoreCase)
Parameters:
stopWords - A List of Strings representing the stopwords
ignoreCase - if true, all words are lower cased first
Returns:
A Set containing the words

incrementToken

public final boolean incrementToken()
                             throws IOException
Returns the next input Token whose term() is not a stop word.

Overrides:
incrementToken in class TokenStream
Returns:
false for end of stream; true otherwise

Note that this method will be defined abstract in Lucene 3.0.

Throws:
IOException

getEnablePositionIncrementsDefault

public static boolean getEnablePositionIncrementsDefault()
Deprecated. Please specify this when you create the StopFilter

See Also:
setEnablePositionIncrementsDefault(boolean).

getEnablePositionIncrementsVersionDefault

public static boolean getEnablePositionIncrementsVersionDefault(Version matchVersion)
Returns version-dependent default for enablePositionIncrements. Analyzers that embed StopFilter use this method when creating the StopFilter. Prior to 2.9, this returns getEnablePositionIncrementsDefault(). On 2.9 or later, it returns true.


setEnablePositionIncrementsDefault

public static void setEnablePositionIncrementsDefault(boolean defaultValue)
Deprecated. Please specify this when you create the StopFilter

Set the default position increments behavior of every StopFilter created from now on.

Note: behavior of a single StopFilter instance can be modified with setEnablePositionIncrements(boolean). This static method allows control over behavior of classes using StopFilters internally, for example StandardAnalyzer if used with the no-arg ctor.

Default : false.

See Also:
setEnablePositionIncrements(boolean).

getEnablePositionIncrements

public boolean getEnablePositionIncrements()
See Also:
setEnablePositionIncrements(boolean).

setEnablePositionIncrements

public void setEnablePositionIncrements(boolean enable)
If true, this StopFilter will preserve positions of the incoming tokens (ie, accumulate and set position increments of the removed stop tokens). Generally, true is best as it does not lose information (positions of the original tokens) during indexing.

When set, when a token is stopped (omitted), the position increment of the following token is incremented.

NOTE: be sure to also set QueryParser.setEnablePositionIncrements(boolean) if you use QueryParser to create queries.



Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.