org.apache.lucene.analysis.core
Class StopFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.util.FilteringTokenFilter
org.apache.lucene.analysis.core.StopFilter
- All Implemented Interfaces:
- Closeable
public final class StopFilter
- extends FilteringTokenFilter
Removes stop words from a token stream.
You must specify the required Version
compatibility when creating StopFilter:
- As of 3.1, StopFilter correctly handles Unicode 4.0
supplementary characters in stopwords and position
increments are preserved
Method Summary |
protected boolean |
accept()
Returns the next input Token whose term() is not a stop word. |
static CharArraySet |
makeStopSet(Version matchVersion,
List<?> stopWords)
Builds a Set from an array of stop words,
appropriate for passing into the StopFilter constructor. |
static CharArraySet |
makeStopSet(Version matchVersion,
List<?> stopWords,
boolean ignoreCase)
Creates a stopword set from the given stopword list. |
static CharArraySet |
makeStopSet(Version matchVersion,
String... stopWords)
Builds a Set from an array of stop words,
appropriate for passing into the StopFilter constructor. |
static CharArraySet |
makeStopSet(Version matchVersion,
String[] stopWords,
boolean ignoreCase)
Creates a stopword set from the given stopword array. |
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString |
StopFilter
public StopFilter(Version matchVersion,
TokenStream in,
CharArraySet stopWords)
- Constructs a filter which removes words from the input TokenStream that are
named in the Set.
- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the stop
set if Version > 3.0. See above for details.in
- Input streamstopWords
- A CharArraySet
representing the stopwords.- See Also:
makeStopSet(Version, java.lang.String...)
makeStopSet
public static CharArraySet makeStopSet(Version matchVersion,
String... stopWords)
- Builds a Set from an array of stop words,
appropriate for passing into the StopFilter constructor.
This permits this stopWords construction to be cached once when
an Analyzer is constructed.
- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords
- An array of stopwords- See Also:
passing false to ignoreCase
makeStopSet
public static CharArraySet makeStopSet(Version matchVersion,
List<?> stopWords)
- Builds a Set from an array of stop words,
appropriate for passing into the StopFilter constructor.
This permits this stopWords construction to be cached once when
an Analyzer is constructed.
- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords
- A List of Strings or char[] or any other toString()-able list representing the stopwords
- Returns:
- A Set (
CharArraySet
) containing the words - See Also:
passing false to ignoreCase
makeStopSet
public static CharArraySet makeStopSet(Version matchVersion,
String[] stopWords,
boolean ignoreCase)
- Creates a stopword set from the given stopword array.
- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords
- An array of stopwordsignoreCase
- If true, all words are lower cased first.
- Returns:
- a Set containing the words
makeStopSet
public static CharArraySet makeStopSet(Version matchVersion,
List<?> stopWords,
boolean ignoreCase)
- Creates a stopword set from the given stopword list.
- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords
- A List of Strings or char[] or any other toString()-able list representing the stopwordsignoreCase
- if true, all words are lower cased first
- Returns:
- A Set (
CharArraySet
) containing the words
accept
protected boolean accept()
- Returns the next input Token whose term() is not a stop word.
- Specified by:
accept
in class FilteringTokenFilter
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.