org.apache.lucene.analysis
Class WordlistLoader

java.lang.Object
  extended by org.apache.lucene.analysis.WordlistLoader

public class WordlistLoader
extends Object

Loader for text files that represent a list of stopwords.


Constructor Summary
WordlistLoader()
           
 
Method Summary
static Set<String> getSnowballWordSet(Class<?> aClass, String stopwordResource)
          Loads a text file in Snowball format associated with a given class (See Class.getResourceAsStream(String)) and adds all words as entries to a Set.
static Set<String> getSnowballWordSet(Reader reader)
          Reads stopwords from a stopword list in Snowball format.
static HashMap<String,String> getStemDict(File wordstemfile)
          Reads a stem dictionary.
static Set<String> getWordSet(Class<?> aClass, String stopwordResource)
          Loads a text file associated with a given class (See Class.getResourceAsStream(String)) and adds every line as an entry to a Set (omitting leading and trailing whitespace).
static Set<String> getWordSet(Class<?> aClass, String stopwordResource, String comment)
          Loads a text file associated with a given class (See Class.getResourceAsStream(String)) and adds every line as an entry to a Set (omitting leading and trailing whitespace).
static HashSet<String> getWordSet(File wordfile)
          Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).
static HashSet<String> getWordSet(File wordfile, String comment)
          Loads a text file and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace).
static HashSet<String> getWordSet(Reader reader)
          Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).
static HashSet<String> getWordSet(Reader reader, String comment)
          Reads lines from a Reader and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordlistLoader

public WordlistLoader()
Method Detail

getWordSet

public static Set<String> getWordSet(Class<?> aClass,
                                     String stopwordResource)
                              throws IOException
Loads a text file associated with a given class (See Class.getResourceAsStream(String)) and adds every line as an entry to a Set (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lower-case if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters:
aClass - a class that is associated with the given stopwordResource
stopwordResource - name of the resource file associated with the given class
Returns:
a Set with the file's words
Throws:
IOException

getWordSet

public static Set<String> getWordSet(Class<?> aClass,
                                     String stopwordResource,
                                     String comment)
                              throws IOException
Loads a text file associated with a given class (See Class.getResourceAsStream(String)) and adds every line as an entry to a Set (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lower-case if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters:
aClass - a class that is associated with the given stopwordResource
stopwordResource - name of the resource file associated with the given class
comment - the comment string to ignore
Returns:
a Set with the file's words
Throws:
IOException

getWordSet

public static HashSet<String> getWordSet(File wordfile)
                                  throws IOException
Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters:
wordfile - File containing the wordlist
Returns:
A HashSet with the file's words
Throws:
IOException

getWordSet

public static HashSet<String> getWordSet(File wordfile,
                                         String comment)
                                  throws IOException
Loads a text file and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters:
wordfile - File containing the wordlist
comment - The comment string to ignore
Returns:
A HashSet with the file's words
Throws:
IOException

getWordSet

public static HashSet<String> getWordSet(Reader reader)
                                  throws IOException
Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters:
reader - Reader containing the wordlist
Returns:
A HashSet with the reader's words
Throws:
IOException

getWordSet

public static HashSet<String> getWordSet(Reader reader,
                                         String comment)
                                  throws IOException
Reads lines from a Reader and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters:
reader - Reader containing the wordlist
comment - The string representing a comment.
Returns:
A HashSet with the reader's words
Throws:
IOException

getSnowballWordSet

public static Set<String> getSnowballWordSet(Class<?> aClass,
                                             String stopwordResource)
                                      throws IOException
Loads a text file in Snowball format associated with a given class (See Class.getResourceAsStream(String)) and adds all words as entries to a Set. The words need to be in lower-case if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters:
aClass - a class that is associated with the given stopwordResource
stopwordResource - name of the resource file associated with the given class
Returns:
a Set with the file's words
Throws:
IOException
See Also:
getSnowballWordSet(Reader)

getSnowballWordSet

public static Set<String> getSnowballWordSet(Reader reader)
                                      throws IOException
Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

Parameters:
reader - Reader containing a Snowball stopword list
Returns:
A Set with the reader's words
Throws:
IOException

getStemDict

public static HashMap<String,String> getStemDict(File wordstemfile)
                                          throws IOException
Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab separated words)

Returns:
stem dictionary that overrules the stemming algorithm
Throws:
IOException


Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.