Package | Description |
---|---|
org.apache.lucene.analysis |
Text analysis.
|
org.apache.lucene.analysis.standard |
Fast, general-purpose grammar-based tokenizer
StandardTokenizer
implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29. |
Modifier and Type | Field and Description |
---|---|
static CharArraySet |
CharArraySet.EMPTY_SET
An empty
CharArraySet . |
protected CharArraySet |
StopwordAnalyzerBase.stopwords
An immutable stopword set
|
Modifier and Type | Method and Description |
---|---|
static CharArraySet |
CharArraySet.copy(Set<?> set)
Returns a copy of the given set as a
CharArraySet . |
static CharArraySet |
WordlistLoader.getSnowballWordSet(Reader reader)
Reads stopwords from a stopword list in Snowball format.
|
static CharArraySet |
WordlistLoader.getSnowballWordSet(Reader reader,
CharArraySet result)
Reads stopwords from a stopword list in Snowball format.
|
CharArraySet |
StopwordAnalyzerBase.getStopwordSet()
Returns the analyzer's stopword set or an empty set if the analyzer has no
stopwords
|
static CharArraySet |
WordlistLoader.getWordSet(Reader reader)
Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting
leading and trailing whitespace).
|
static CharArraySet |
WordlistLoader.getWordSet(Reader reader,
CharArraySet result)
Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting
leading and trailing whitespace).
|
static CharArraySet |
WordlistLoader.getWordSet(Reader reader,
String comment)
Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting
leading and trailing whitespace).
|
static CharArraySet |
WordlistLoader.getWordSet(Reader reader,
String comment,
CharArraySet result)
Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting
leading and trailing whitespace).
|
CharArraySet |
CharArrayMap.keySet()
Returns an
CharArraySet view on the map's keys. |
protected static CharArraySet |
StopwordAnalyzerBase.loadStopwordSet(boolean ignoreCase,
Class<? extends Analyzer> aClass,
String resource,
String comment)
Creates a CharArraySet from a file resource associated with a class.
|
protected static CharArraySet |
StopwordAnalyzerBase.loadStopwordSet(Path stopwords)
Creates a CharArraySet from a path.
|
protected static CharArraySet |
StopwordAnalyzerBase.loadStopwordSet(Reader stopwords)
Creates a CharArraySet from a file.
|
static CharArraySet |
StopFilter.makeStopSet(List<?> stopWords)
Builds a Set from an array of stop words,
appropriate for passing into the StopFilter constructor.
|
static CharArraySet |
StopFilter.makeStopSet(List<?> stopWords,
boolean ignoreCase)
Creates a stopword set from the given stopword list.
|
static CharArraySet |
StopFilter.makeStopSet(String... stopWords)
Builds a Set from an array of stop words,
appropriate for passing into the StopFilter constructor.
|
static CharArraySet |
StopFilter.makeStopSet(String[] stopWords,
boolean ignoreCase)
Creates a stopword set from the given stopword array.
|
static CharArraySet |
CharArraySet.unmodifiableSet(CharArraySet set)
Returns an unmodifiable
CharArraySet . |
Modifier and Type | Method and Description |
---|---|
static CharArraySet |
WordlistLoader.getSnowballWordSet(Reader reader,
CharArraySet result)
Reads stopwords from a stopword list in Snowball format.
|
static CharArraySet |
WordlistLoader.getWordSet(Reader reader,
CharArraySet result)
Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting
leading and trailing whitespace).
|
static CharArraySet |
WordlistLoader.getWordSet(Reader reader,
String comment,
CharArraySet result)
Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting
leading and trailing whitespace).
|
static CharArraySet |
CharArraySet.unmodifiableSet(CharArraySet set)
Returns an unmodifiable
CharArraySet . |
Constructor and Description |
---|
StopFilter(TokenStream in,
CharArraySet stopWords)
Constructs a filter which removes words from the input TokenStream that are
named in the Set.
|
StopwordAnalyzerBase(CharArraySet stopwords)
Creates a new instance initialized with the given stopword set
|
Modifier and Type | Field and Description |
---|---|
static CharArraySet |
StandardAnalyzer.ENGLISH_STOP_WORDS_SET
An unmodifiable set containing some common English words that are not usually useful
for searching.
|
static CharArraySet |
StandardAnalyzer.STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not
useful for searching.
|
Constructor and Description |
---|
StandardAnalyzer(CharArraySet stopWords)
Builds an analyzer with the given stop words.
|
Copyright © 2000-2017 Apache Software Foundation. All Rights Reserved.