org.apache.lucene.analysis
Class StopFilter

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.TokenFilter
              extended by org.apache.lucene.analysis.FilteringTokenFilter
                  extended by org.apache.lucene.analysis.StopFilter
All Implemented Interfaces:
Closeable

public final class StopFilter
extends FilteringTokenFilter

Removes stop words from a token stream.

You must specify the required Version compatibility when creating StopFilter:


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
 
Constructor Summary
StopFilter(boolean enablePositionIncrements, TokenStream in, Set<?> stopWords)
          Deprecated. use StopFilter(Version, TokenStream, Set) instead
StopFilter(boolean enablePositionIncrements, TokenStream input, Set<?> stopWords, boolean ignoreCase)
          Deprecated. use StopFilter(Version, TokenStream, Set, boolean) instead
StopFilter(Version matchVersion, TokenStream in, Set<?> stopWords)
          Constructs a filter which removes words from the input TokenStream that are named in the Set.
StopFilter(Version matchVersion, TokenStream input, Set<?> stopWords, boolean ignoreCase)
          Construct a token stream filtering the given input.
 
Method Summary
protected  boolean accept()
          Returns the next input Token whose term() is not a stop word.
static boolean getEnablePositionIncrementsVersionDefault(Version matchVersion)
          Deprecated. use StopFilter(Version, TokenStream, Set) instead
static Set<Object> makeStopSet(List<?> stopWords)
          Deprecated. use makeStopSet(Version, List) instead
static Set<Object> makeStopSet(List<?> stopWords, boolean ignoreCase)
          Deprecated. use makeStopSet(Version, List, boolean) instead
static Set<Object> makeStopSet(String... stopWords)
          Deprecated. use makeStopSet(Version, String...) instead
static Set<Object> makeStopSet(String[] stopWords, boolean ignoreCase)
          Deprecated. use makeStopSet(Version, String[], boolean) instead;
static Set<Object> makeStopSet(Version matchVersion, List<?> stopWords)
          Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
static Set<Object> makeStopSet(Version matchVersion, List<?> stopWords, boolean ignoreCase)
          Creates a stopword set from the given stopword list.
static Set<Object> makeStopSet(Version matchVersion, String... stopWords)
          Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
static Set<Object> makeStopSet(Version matchVersion, String[] stopWords, boolean ignoreCase)
          Creates a stopword set from the given stopword array.
 
Methods inherited from class org.apache.lucene.analysis.FilteringTokenFilter
getEnablePositionIncrements, incrementToken, setEnablePositionIncrements
 
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, reset
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

StopFilter

@Deprecated
public StopFilter(boolean enablePositionIncrements,
                             TokenStream input,
                             Set<?> stopWords,
                             boolean ignoreCase)
Deprecated. use StopFilter(Version, TokenStream, Set, boolean) instead

Construct a token stream filtering the given input. If stopWords is an instance of CharArraySet (true if makeStopSet() was used to construct the set) it will be directly used and ignoreCase will be ignored since CharArraySet directly controls case sensitivity.

If stopWords is not an instance of CharArraySet, a new CharArraySet will be constructed and ignoreCase will be used to specify the case sensitivity of that set.

Parameters:
enablePositionIncrements - true if token positions should record the removed stop words
input - Input TokenStream
stopWords - A Set of Strings or char[] or any other toString()-able set representing the stopwords
ignoreCase - if true, all words are lower cased first

StopFilter

public StopFilter(Version matchVersion,
                  TokenStream input,
                  Set<?> stopWords,
                  boolean ignoreCase)
Construct a token stream filtering the given input. If stopWords is an instance of CharArraySet (true if makeStopSet() was used to construct the set) it will be directly used and ignoreCase will be ignored since CharArraySet directly controls case sensitivity.

If stopWords is not an instance of CharArraySet, a new CharArraySet will be constructed and ignoreCase will be used to specify the case sensitivity of that set.

Parameters:
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the stop set if Version > 3.0. See above for details.
input - Input TokenStream
stopWords - A Set of Strings or char[] or any other toString()-able set representing the stopwords
ignoreCase - if true, all words are lower cased first

StopFilter

@Deprecated
public StopFilter(boolean enablePositionIncrements,
                             TokenStream in,
                             Set<?> stopWords)
Deprecated. use StopFilter(Version, TokenStream, Set) instead

Constructs a filter which removes words from the input TokenStream that are named in the Set.

Parameters:
enablePositionIncrements - true if token positions should record the removed stop words
in - Input stream
stopWords - A Set of Strings or char[] or any other toString()-able set representing the stopwords
See Also:
makeStopSet(Version, java.lang.String[])

StopFilter

public StopFilter(Version matchVersion,
                  TokenStream in,
                  Set<?> stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set.

Parameters:
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the stop set if Version > 3.0. See above for details.
in - Input stream
stopWords - A Set of Strings or char[] or any other toString()-able set representing the stopwords
See Also:
makeStopSet(Version, java.lang.String[])
Method Detail

makeStopSet

@Deprecated
public static final Set<Object> makeStopSet(String... stopWords)
Deprecated. use makeStopSet(Version, String...) instead

Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

See Also:
passing false to ignoreCase

makeStopSet

public static final Set<Object> makeStopSet(Version matchVersion,
                                            String... stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

Parameters:
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
stopWords - An array of stopwords
See Also:
passing false to ignoreCase

makeStopSet

@Deprecated
public static final Set<Object> makeStopSet(List<?> stopWords)
Deprecated. use makeStopSet(Version, List) instead

Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

Parameters:
stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
Returns:
A Set (CharArraySet) containing the words
See Also:
passing false to ignoreCase

makeStopSet

public static final Set<Object> makeStopSet(Version matchVersion,
                                            List<?> stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

Parameters:
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
Returns:
A Set (CharArraySet) containing the words
See Also:
passing false to ignoreCase

makeStopSet

@Deprecated
public static final Set<Object> makeStopSet(String[] stopWords,
                                                       boolean ignoreCase)
Deprecated. use makeStopSet(Version, String[], boolean) instead;

Creates a stopword set from the given stopword array.

Parameters:
stopWords - An array of stopwords
ignoreCase - If true, all words are lower cased first.
Returns:
a Set containing the words

makeStopSet

public static final Set<Object> makeStopSet(Version matchVersion,
                                            String[] stopWords,
                                            boolean ignoreCase)
Creates a stopword set from the given stopword array.

Parameters:
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
stopWords - An array of stopwords
ignoreCase - If true, all words are lower cased first.
Returns:
a Set containing the words

makeStopSet

@Deprecated
public static final Set<Object> makeStopSet(List<?> stopWords,
                                                       boolean ignoreCase)
Deprecated. use makeStopSet(Version, List, boolean) instead

Creates a stopword set from the given stopword list.

Parameters:
stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
ignoreCase - if true, all words are lower cased first
Returns:
A Set (CharArraySet) containing the words

makeStopSet

public static final Set<Object> makeStopSet(Version matchVersion,
                                            List<?> stopWords,
                                            boolean ignoreCase)
Creates a stopword set from the given stopword list.

Parameters:
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
ignoreCase - if true, all words are lower cased first
Returns:
A Set (CharArraySet) containing the words

accept

protected boolean accept()
                  throws IOException
Returns the next input Token whose term() is not a stop word.

Specified by:
accept in class FilteringTokenFilter
Throws:
IOException

getEnablePositionIncrementsVersionDefault

@Deprecated
public static boolean getEnablePositionIncrementsVersionDefault(Version matchVersion)
Deprecated. use StopFilter(Version, TokenStream, Set) instead

Returns version-dependent default for enablePositionIncrements. Analyzers that embed StopFilter use this method when creating the StopFilter. Prior to 2.9, this returns false. On 2.9 or later, it returns true.



Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.