org.apache.lucene.analysis.standard
Class StandardAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.standard.StandardAnalyzer

public class StandardAnalyzer
extends Analyzer

Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

You must specify the required Version compatibility when creating StandardAnalyzer:

Version:
$Id: StandardAnalyzer.java 829134 2009-10-23 17:18:53Z mikemccand $

Field Summary
static int DEFAULT_MAX_TOKEN_LENGTH
          Default maximum allowed token length
static String[] STOP_WORDS
          Deprecated. Use STOP_WORDS_SET instead
static Set STOP_WORDS_SET
          An unmodifiable set containing some common English words that are usually not useful for searching.
 
Fields inherited from class org.apache.lucene.analysis.Analyzer
overridesTokenStreamMethod
 
Constructor Summary
StandardAnalyzer()
          Deprecated. Use StandardAnalyzer(Version) instead.
StandardAnalyzer(boolean replaceInvalidAcronym)
          Deprecated. Remove in 3.X and make true the only valid value
StandardAnalyzer(File stopwords)
          Deprecated. Use StandardAnalyzer(Version, File) instead
StandardAnalyzer(File stopwords, boolean replaceInvalidAcronym)
          Deprecated. Remove in 3.X and make true the only valid value
StandardAnalyzer(Reader stopwords)
          Deprecated. Use StandardAnalyzer(Version, Reader) instead
StandardAnalyzer(Reader stopwords, boolean replaceInvalidAcronym)
          Deprecated. Remove in 3.X and make true the only valid value
StandardAnalyzer(Set stopWords)
          Deprecated. Use StandardAnalyzer(Version, Set) instead
StandardAnalyzer(Set stopwords, boolean replaceInvalidAcronym)
          Deprecated. Remove in 3.X and make true the only valid value
StandardAnalyzer(String[] stopWords)
          Deprecated. Use StandardAnalyzer(Version, Set) instead
StandardAnalyzer(String[] stopwords, boolean replaceInvalidAcronym)
          Deprecated. Remove in 3.X and make true the only valid value
StandardAnalyzer(Version matchVersion)
          Builds an analyzer with the default stop words (STOP_WORDS).
StandardAnalyzer(Version matchVersion, File stopwords)
          Builds an analyzer with the stop words from the given file.
StandardAnalyzer(Version matchVersion, Reader stopwords)
          Builds an analyzer with the stop words from the given reader.
StandardAnalyzer(Version matchVersion, Set stopWords)
          Builds an analyzer with the given stop words.
 
Method Summary
static boolean getDefaultReplaceInvalidAcronym()
          Deprecated. This will be removed (hardwired to true) in 3.0
 int getMaxTokenLength()
           
 boolean isReplaceInvalidAcronym()
          Deprecated. This will be removed (hardwired to true) in 3.0
 TokenStream reusableTokenStream(String fieldName, Reader reader)
          Deprecated. Use tokenStream(java.lang.String, java.io.Reader) instead
static void setDefaultReplaceInvalidAcronym(boolean replaceInvalidAcronym)
          Deprecated. This will be removed (hardwired to true) in 3.0
 void setMaxTokenLength(int length)
          Set maximum allowed token length.
 void setReplaceInvalidAcronym(boolean replaceInvalidAcronym)
          Deprecated. This will be removed (hardwired to true) in 3.0
 TokenStream tokenStream(String fieldName, Reader reader)
          Constructs a StandardTokenizer filtered by a StandardFilter, a LowerCaseFilter and a StopFilter.
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setOverridesTokenStreamMethod, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

STOP_WORDS

public static final String[] STOP_WORDS
Deprecated. Use STOP_WORDS_SET instead
An array containing some common English words that are usually not useful for searching.


STOP_WORDS_SET

public static final Set STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.


DEFAULT_MAX_TOKEN_LENGTH

public static final int DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length

See Also:
Constant Field Values
Constructor Detail

StandardAnalyzer

public StandardAnalyzer()
Deprecated. Use StandardAnalyzer(Version) instead.

Builds an analyzer with the default stop words (STOP_WORDS_SET).


StandardAnalyzer

public StandardAnalyzer(Version matchVersion)
Builds an analyzer with the default stop words (STOP_WORDS).

Parameters:
matchVersion - Lucene version to match See above

StandardAnalyzer

public StandardAnalyzer(Set stopWords)
Deprecated. Use StandardAnalyzer(Version, Set) instead

Builds an analyzer with the given stop words.


StandardAnalyzer

public StandardAnalyzer(Version matchVersion,
                        Set stopWords)
Builds an analyzer with the given stop words.

Parameters:
matchVersion - Lucene version to match See above
stopWords - stop words

StandardAnalyzer

public StandardAnalyzer(String[] stopWords)
Deprecated. Use StandardAnalyzer(Version, Set) instead

Builds an analyzer with the given stop words.


StandardAnalyzer

public StandardAnalyzer(File stopwords)
                 throws IOException
Deprecated. Use StandardAnalyzer(Version, File) instead

Builds an analyzer with the stop words from the given file.

Throws:
IOException
See Also:
WordlistLoader.getWordSet(File)

StandardAnalyzer

public StandardAnalyzer(Version matchVersion,
                        File stopwords)
                 throws IOException
Builds an analyzer with the stop words from the given file.

Parameters:
matchVersion - Lucene version to match See above
stopwords - File to read stop words from
Throws:
IOException
See Also:
WordlistLoader.getWordSet(File)

StandardAnalyzer

public StandardAnalyzer(Reader stopwords)
                 throws IOException
Deprecated. Use StandardAnalyzer(Version, Reader) instead

Builds an analyzer with the stop words from the given reader.

Throws:
IOException
See Also:
WordlistLoader.getWordSet(Reader)

StandardAnalyzer

public StandardAnalyzer(Version matchVersion,
                        Reader stopwords)
                 throws IOException
Builds an analyzer with the stop words from the given reader.

Parameters:
matchVersion - Lucene version to match See above
stopwords - Reader to read stop words from
Throws:
IOException
See Also:
WordlistLoader.getWordSet(Reader)

StandardAnalyzer

public StandardAnalyzer(boolean replaceInvalidAcronym)
Deprecated. Remove in 3.X and make true the only valid value

Parameters:
replaceInvalidAcronym - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer See https://issues.apache.org/jira/browse/LUCENE-1068

StandardAnalyzer

public StandardAnalyzer(Reader stopwords,
                        boolean replaceInvalidAcronym)
                 throws IOException
Deprecated. Remove in 3.X and make true the only valid value

Parameters:
stopwords - The stopwords to use
replaceInvalidAcronym - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer See https://issues.apache.org/jira/browse/LUCENE-1068
Throws:
IOException

StandardAnalyzer

public StandardAnalyzer(File stopwords,
                        boolean replaceInvalidAcronym)
                 throws IOException
Deprecated. Remove in 3.X and make true the only valid value

Parameters:
stopwords - The stopwords to use
replaceInvalidAcronym - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer See https://issues.apache.org/jira/browse/LUCENE-1068
Throws:
IOException

StandardAnalyzer

public StandardAnalyzer(String[] stopwords,
                        boolean replaceInvalidAcronym)
                 throws IOException
Deprecated. Remove in 3.X and make true the only valid value

Parameters:
stopwords - The stopwords to use
replaceInvalidAcronym - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer See https://issues.apache.org/jira/browse/LUCENE-1068
Throws:
IOException

StandardAnalyzer

public StandardAnalyzer(Set stopwords,
                        boolean replaceInvalidAcronym)
                 throws IOException
Deprecated. Remove in 3.X and make true the only valid value

Parameters:
stopwords - The stopwords to use
replaceInvalidAcronym - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer See https://issues.apache.org/jira/browse/LUCENE-1068
Throws:
IOException
Method Detail

getDefaultReplaceInvalidAcronym

public static boolean getDefaultReplaceInvalidAcronym()
Deprecated. This will be removed (hardwired to true) in 3.0

Returns:
true if new instances of StandardTokenizer will replace mischaracterized acronyms See https://issues.apache.org/jira/browse/LUCENE-1068

setDefaultReplaceInvalidAcronym

public static void setDefaultReplaceInvalidAcronym(boolean replaceInvalidAcronym)
Deprecated. This will be removed (hardwired to true) in 3.0

Parameters:
replaceInvalidAcronym - Set to true to have new instances of StandardTokenizer replace mischaracterized acronyms by default. Set to false to preserve the previous (before 2.4) buggy behavior. Alternatively, set the system property org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym to false. See https://issues.apache.org/jira/browse/LUCENE-1068

tokenStream

public TokenStream tokenStream(String fieldName,
                               Reader reader)
Constructs a StandardTokenizer filtered by a StandardFilter, a LowerCaseFilter and a StopFilter.

Specified by:
tokenStream in class Analyzer

setMaxTokenLength

public void setMaxTokenLength(int length)
Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or reusableTokenStream is called.


getMaxTokenLength

public int getMaxTokenLength()
See Also:
setMaxTokenLength(int)

reusableTokenStream

public TokenStream reusableTokenStream(String fieldName,
                                       Reader reader)
                                throws IOException
Deprecated. Use tokenStream(java.lang.String, java.io.Reader) instead

Description copied from class: Analyzer
Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method. Callers that do not need to use more than one TokenStream at the same time from this analyzer should use this method for better performance.

Overrides:
reusableTokenStream in class Analyzer
Throws:
IOException

isReplaceInvalidAcronym

public boolean isReplaceInvalidAcronym()
Deprecated. This will be removed (hardwired to true) in 3.0

Returns:
true if this Analyzer is replacing mischaracterized acronyms in the StandardTokenizer See https://issues.apache.org/jira/browse/LUCENE-1068

setReplaceInvalidAcronym

public void setReplaceInvalidAcronym(boolean replaceInvalidAcronym)
Deprecated. This will be removed (hardwired to true) in 3.0

Parameters:
replaceInvalidAcronym - Set to true if this Analyzer is replacing mischaracterized acronyms in the StandardTokenizer See https://issues.apache.org/jira/browse/LUCENE-1068


Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.