org.apache.lucene.analysis.core
Class StopFilter

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.TokenFilter
              extended by org.apache.lucene.analysis.util.FilteringTokenFilter
                  extended by org.apache.lucene.analysis.core.StopFilter
All Implemented Interfaces:
Closeable

public final class StopFilter
extends FilteringTokenFilter

Removes stop words from a token stream.

You must specify the required Version compatibility when creating StopFilter:


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
 
Constructor Summary
StopFilter(Version matchVersion, TokenStream in, CharArraySet stopWords)
          Constructs a filter which removes words from the input TokenStream that are named in the Set.
 
Method Summary
protected  boolean accept()
          Returns the next input Token whose term() is not a stop word.
static CharArraySet makeStopSet(Version matchVersion, List<?> stopWords)
          Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
static CharArraySet makeStopSet(Version matchVersion, List<?> stopWords, boolean ignoreCase)
          Creates a stopword set from the given stopword list.
static CharArraySet makeStopSet(Version matchVersion, String... stopWords)
          Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
static CharArraySet makeStopSet(Version matchVersion, String[] stopWords, boolean ignoreCase)
          Creates a stopword set from the given stopword array.
 
Methods inherited from class org.apache.lucene.analysis.util.FilteringTokenFilter
getEnablePositionIncrements, incrementToken, reset, setEnablePositionIncrements
 
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StopFilter

public StopFilter(Version matchVersion,
                  TokenStream in,
                  CharArraySet stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set.

Parameters:
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the stop set if Version > 3.0. See above for details.
in - Input stream
stopWords - A CharArraySet representing the stopwords.
See Also:
makeStopSet(Version, java.lang.String...)
Method Detail

makeStopSet

public static CharArraySet makeStopSet(Version matchVersion,
                                       String... stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

Parameters:
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
stopWords - An array of stopwords
See Also:
passing false to ignoreCase

makeStopSet

public static CharArraySet makeStopSet(Version matchVersion,
                                       List<?> stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

Parameters:
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
Returns:
A Set (CharArraySet) containing the words
See Also:
passing false to ignoreCase

makeStopSet

public static CharArraySet makeStopSet(Version matchVersion,
                                       String[] stopWords,
                                       boolean ignoreCase)
Creates a stopword set from the given stopword array.

Parameters:
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
stopWords - An array of stopwords
ignoreCase - If true, all words are lower cased first.
Returns:
a Set containing the words

makeStopSet

public static CharArraySet makeStopSet(Version matchVersion,
                                       List<?> stopWords,
                                       boolean ignoreCase)
Creates a stopword set from the given stopword list.

Parameters:
matchVersion - Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0
stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
ignoreCase - if true, all words are lower cased first
Returns:
A Set (CharArraySet) containing the words

accept

protected boolean accept()
Returns the next input Token whose term() is not a stop word.

Specified by:
accept in class FilteringTokenFilter


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.