|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.analysis.Analyzer org.apache.lucene.analysis.standard.StandardAnalyzer
public class StandardAnalyzer
Filters StandardTokenizer
with StandardFilter
, LowerCaseFilter
and StopFilter
, using a list of
English stop words.
You must specify the required Version
compatibility when creating StandardAnalyzer:
Field Summary | |
---|---|
static int |
DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length |
static String[] |
STOP_WORDS
Deprecated. Use STOP_WORDS_SET instead |
static Set |
STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching. |
Fields inherited from class org.apache.lucene.analysis.Analyzer |
---|
overridesTokenStreamMethod |
Constructor Summary | |
---|---|
StandardAnalyzer()
Deprecated. Use StandardAnalyzer(Version) instead. |
|
StandardAnalyzer(boolean replaceInvalidAcronym)
Deprecated. Remove in 3.X and make true the only valid value |
|
StandardAnalyzer(File stopwords)
Deprecated. Use StandardAnalyzer(Version, File)
instead |
|
StandardAnalyzer(File stopwords,
boolean replaceInvalidAcronym)
Deprecated. Remove in 3.X and make true the only valid value |
|
StandardAnalyzer(Reader stopwords)
Deprecated. Use StandardAnalyzer(Version, Reader)
instead |
|
StandardAnalyzer(Reader stopwords,
boolean replaceInvalidAcronym)
Deprecated. Remove in 3.X and make true the only valid value |
|
StandardAnalyzer(Set stopWords)
Deprecated. Use StandardAnalyzer(Version, Set)
instead |
|
StandardAnalyzer(Set stopwords,
boolean replaceInvalidAcronym)
Deprecated. Remove in 3.X and make true the only valid value |
|
StandardAnalyzer(String[] stopWords)
Deprecated. Use StandardAnalyzer(Version, Set) instead |
|
StandardAnalyzer(String[] stopwords,
boolean replaceInvalidAcronym)
Deprecated. Remove in 3.X and make true the only valid value |
|
StandardAnalyzer(Version matchVersion)
Builds an analyzer with the default stop words ( STOP_WORDS ). |
|
StandardAnalyzer(Version matchVersion,
File stopwords)
Builds an analyzer with the stop words from the given file. |
|
StandardAnalyzer(Version matchVersion,
Reader stopwords)
Builds an analyzer with the stop words from the given reader. |
|
StandardAnalyzer(Version matchVersion,
Set stopWords)
Builds an analyzer with the given stop words. |
Method Summary | |
---|---|
static boolean |
getDefaultReplaceInvalidAcronym()
Deprecated. This will be removed (hardwired to true) in 3.0 |
int |
getMaxTokenLength()
|
boolean |
isReplaceInvalidAcronym()
Deprecated. This will be removed (hardwired to true) in 3.0 |
TokenStream |
reusableTokenStream(String fieldName,
Reader reader)
Deprecated. Use tokenStream(java.lang.String, java.io.Reader) instead |
static void |
setDefaultReplaceInvalidAcronym(boolean replaceInvalidAcronym)
Deprecated. This will be removed (hardwired to true) in 3.0 |
void |
setMaxTokenLength(int length)
Set maximum allowed token length. |
void |
setReplaceInvalidAcronym(boolean replaceInvalidAcronym)
Deprecated. This will be removed (hardwired to true) in 3.0 |
TokenStream |
tokenStream(String fieldName,
Reader reader)
Constructs a StandardTokenizer filtered by a StandardFilter , a LowerCaseFilter and a StopFilter . |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
---|
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setOverridesTokenStreamMethod, setPreviousTokenStream |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String[] STOP_WORDS
STOP_WORDS_SET
instead
public static final Set STOP_WORDS_SET
public static final int DEFAULT_MAX_TOKEN_LENGTH
Constructor Detail |
---|
public StandardAnalyzer()
StandardAnalyzer(Version)
instead.
STOP_WORDS_SET
).
public StandardAnalyzer(Version matchVersion)
STOP_WORDS
).
matchVersion
- Lucene version to match See abovepublic StandardAnalyzer(Set stopWords)
StandardAnalyzer(Version, Set)
instead
public StandardAnalyzer(Version matchVersion, Set stopWords)
matchVersion
- Lucene version to match See abovestopWords
- stop wordspublic StandardAnalyzer(String[] stopWords)
StandardAnalyzer(Version, Set)
instead
public StandardAnalyzer(File stopwords) throws IOException
StandardAnalyzer(Version, File)
instead
IOException
WordlistLoader.getWordSet(File)
public StandardAnalyzer(Version matchVersion, File stopwords) throws IOException
matchVersion
- Lucene version to match See abovestopwords
- File to read stop words from
IOException
WordlistLoader.getWordSet(File)
public StandardAnalyzer(Reader stopwords) throws IOException
StandardAnalyzer(Version, Reader)
instead
IOException
WordlistLoader.getWordSet(Reader)
public StandardAnalyzer(Version matchVersion, Reader stopwords) throws IOException
matchVersion
- Lucene version to match See abovestopwords
- Reader to read stop words from
IOException
WordlistLoader.getWordSet(Reader)
public StandardAnalyzer(boolean replaceInvalidAcronym)
replaceInvalidAcronym
- Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068public StandardAnalyzer(Reader stopwords, boolean replaceInvalidAcronym) throws IOException
stopwords
- The stopwords to usereplaceInvalidAcronym
- Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068
IOException
public StandardAnalyzer(File stopwords, boolean replaceInvalidAcronym) throws IOException
stopwords
- The stopwords to usereplaceInvalidAcronym
- Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068
IOException
public StandardAnalyzer(String[] stopwords, boolean replaceInvalidAcronym) throws IOException
stopwords
- The stopwords to usereplaceInvalidAcronym
- Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068
IOException
public StandardAnalyzer(Set stopwords, boolean replaceInvalidAcronym) throws IOException
stopwords
- The stopwords to usereplaceInvalidAcronym
- Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068
IOException
Method Detail |
---|
public static boolean getDefaultReplaceInvalidAcronym()
public static void setDefaultReplaceInvalidAcronym(boolean replaceInvalidAcronym)
replaceInvalidAcronym
- Set to true to have new
instances of StandardTokenizer replace mischaracterized
acronyms by default. Set to false to preserve the
previous (before 2.4) buggy behavior. Alternatively,
set the system property
org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym
to false.
See https://issues.apache.org/jira/browse/LUCENE-1068public TokenStream tokenStream(String fieldName, Reader reader)
StandardTokenizer
filtered by a StandardFilter
, a LowerCaseFilter
and a StopFilter
.
tokenStream
in class Analyzer
public void setMaxTokenLength(int length)
public int getMaxTokenLength()
setMaxTokenLength(int)
public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
tokenStream(java.lang.String, java.io.Reader)
instead
Analyzer
reusableTokenStream
in class Analyzer
IOException
public boolean isReplaceInvalidAcronym()
public void setReplaceInvalidAcronym(boolean replaceInvalidAcronym)
replaceInvalidAcronym
- Set to true if this Analyzer is replacing mischaracterized acronyms in the StandardTokenizer
See https://issues.apache.org/jira/browse/LUCENE-1068
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |