|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.util.AttributeSource org.apache.lucene.analysis.TokenStream org.apache.lucene.analysis.TokenFilter org.apache.lucene.analysis.StopFilter
public final class StopFilter
Removes stop words from a token stream.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
---|
AttributeSource.AttributeFactory, AttributeSource.State |
Field Summary |
---|
Fields inherited from class org.apache.lucene.analysis.TokenFilter |
---|
input |
Constructor Summary | |
---|---|
StopFilter(boolean enablePositionIncrements,
TokenStream in,
Set<?> stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set. |
|
StopFilter(boolean enablePositionIncrements,
TokenStream input,
Set<?> stopWords,
boolean ignoreCase)
Construct a token stream filtering the given input. |
Method Summary | |
---|---|
boolean |
getEnablePositionIncrements()
|
static boolean |
getEnablePositionIncrementsVersionDefault(Version matchVersion)
Returns version-dependent default for enablePositionIncrements. |
boolean |
incrementToken()
Returns the next input Token whose term() is not a stop word. |
static Set<Object> |
makeStopSet(List<?> stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. |
static Set<Object> |
makeStopSet(List<?> stopWords,
boolean ignoreCase)
|
static Set<Object> |
makeStopSet(String... stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. |
static Set<Object> |
makeStopSet(String[] stopWords,
boolean ignoreCase)
|
void |
setEnablePositionIncrements(boolean enable)
If true , this StopFilter will preserve
positions of the incoming tokens (ie, accumulate and
set position increments of the removed stop tokens). |
Methods inherited from class org.apache.lucene.analysis.TokenFilter |
---|
close, end, reset |
Methods inherited from class org.apache.lucene.util.AttributeSource |
---|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public StopFilter(boolean enablePositionIncrements, TokenStream input, Set<?> stopWords, boolean ignoreCase)
stopWords
is an instance of CharArraySet
(true if
makeStopSet()
was used to construct the set) it will be directly used
and ignoreCase
will be ignored since CharArraySet
directly controls case sensitivity.
If stopWords
is not an instance of CharArraySet
,
a new CharArraySet will be constructed and ignoreCase
will be
used to specify the case sensitivity of that set.
enablePositionIncrements
- true if token positions should record the removed stop wordsinput
- Input TokenStreamstopWords
- A Set of Strings or char[] or any other toString()-able set representing the stopwordsignoreCase
- if true, all words are lower cased firstpublic StopFilter(boolean enablePositionIncrements, TokenStream in, Set<?> stopWords)
enablePositionIncrements
- true if token positions should record the removed stop wordsin
- Input streamstopWords
- A Set of Strings or char[] or any other toString()-able set representing the stopwordsmakeStopSet(java.lang.String[])
Method Detail |
---|
public static final Set<Object> makeStopSet(String... stopWords)
passing false to ignoreCase
public static final Set<Object> makeStopSet(List<?> stopWords)
stopWords
- A List of Strings or char[] or any other toString()-able list representing the stopwords
CharArraySet
) containing the wordspassing false to ignoreCase
public static final Set<Object> makeStopSet(String[] stopWords, boolean ignoreCase)
stopWords
- An array of stopwordsignoreCase
- If true, all words are lower cased first.
public static final Set<Object> makeStopSet(List<?> stopWords, boolean ignoreCase)
stopWords
- A List of Strings or char[] or any other toString()-able list representing the stopwordsignoreCase
- if true, all words are lower cased first
CharArraySet
) containing the wordspublic final boolean incrementToken() throws IOException
incrementToken
in class TokenStream
IOException
public static boolean getEnablePositionIncrementsVersionDefault(Version matchVersion)
public boolean getEnablePositionIncrements()
setEnablePositionIncrements(boolean).
public void setEnablePositionIncrements(boolean enable)
true
, this StopFilter will preserve
positions of the incoming tokens (ie, accumulate and
set position increments of the removed stop tokens).
Generally, true
is best as it does not
lose information (positions of the original tokens)
during indexing.
When set, when a token is stopped (omitted), the position increment of the following token is incremented.
NOTE: be sure to also
set QueryParser.setEnablePositionIncrements(boolean)
if
you use QueryParser to create queries.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |