|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.analysis.WordlistLoader
public class WordlistLoader
Loader for text files that represent a list of stopwords.
Constructor Summary | |
---|---|
WordlistLoader()
|
Method Summary | |
---|---|
static Set<String> |
getSnowballWordSet(Class<?> aClass,
String stopwordResource)
Loads a text file in Snowball format associated with a given class (See Class.getResourceAsStream(String) ) and adds all words as entries to
a Set . |
static Set<String> |
getSnowballWordSet(Reader reader)
Reads stopwords from a stopword list in Snowball format. |
static HashMap<String,String> |
getStemDict(File wordstemfile)
Reads a stem dictionary. |
static Set<String> |
getWordSet(Class<?> aClass,
String stopwordResource)
Loads a text file associated with a given class (See Class.getResourceAsStream(String) ) and adds every line as an entry
to a Set (omitting leading and trailing whitespace). |
static Set<String> |
getWordSet(Class<?> aClass,
String stopwordResource,
String comment)
Loads a text file associated with a given class (See Class.getResourceAsStream(String) ) and adds every line as an entry
to a Set (omitting leading and trailing whitespace). |
static HashSet<String> |
getWordSet(File wordfile)
Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). |
static HashSet<String> |
getWordSet(File wordfile,
String comment)
Loads a text file and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace). |
static HashSet<String> |
getWordSet(Reader reader)
Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). |
static HashSet<String> |
getWordSet(Reader reader,
String comment)
Reads lines from a Reader and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace). |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public WordlistLoader()
Method Detail |
---|
public static Set<String> getWordSet(Class<?> aClass, String stopwordResource) throws IOException
Class.getResourceAsStream(String)
) and adds every line as an entry
to a Set
(omitting leading and trailing whitespace). Every line of
the file should contain only one word. The words need to be in lower-case if
you make use of an Analyzer which uses LowerCaseFilter (like
StandardAnalyzer).
aClass
- a class that is associated with the given stopwordResourcestopwordResource
- name of the resource file associated with the given class
Set
with the file's words
IOException
public static Set<String> getWordSet(Class<?> aClass, String stopwordResource, String comment) throws IOException
Class.getResourceAsStream(String)
) and adds every line as an entry
to a Set
(omitting leading and trailing whitespace). Every line of
the file should contain only one word. The words need to be in lower-case if
you make use of an Analyzer which uses LowerCaseFilter (like
StandardAnalyzer).
aClass
- a class that is associated with the given stopwordResourcestopwordResource
- name of the resource file associated with the given classcomment
- the comment string to ignore
Set
with the file's words
IOException
public static HashSet<String> getWordSet(File wordfile) throws IOException
wordfile
- File containing the wordlist
IOException
public static HashSet<String> getWordSet(File wordfile, String comment) throws IOException
wordfile
- File containing the wordlistcomment
- The comment string to ignore
IOException
public static HashSet<String> getWordSet(Reader reader) throws IOException
reader
- Reader containing the wordlist
IOException
public static HashSet<String> getWordSet(Reader reader, String comment) throws IOException
reader
- Reader containing the wordlistcomment
- The string representing a comment.
IOException
public static Set<String> getSnowballWordSet(Class<?> aClass, String stopwordResource) throws IOException
Class.getResourceAsStream(String)
) and adds all words as entries to
a Set
. The words need to be in lower-case if you make use of an
Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
aClass
- a class that is associated with the given stopwordResourcestopwordResource
- name of the resource file associated with the given
class
Set
with the file's words
IOException
getSnowballWordSet(Reader)
public static Set<String> getSnowballWordSet(Reader reader) throws IOException
The snowball format is the following:
reader
- Reader containing a Snowball stopword list
IOException
public static HashMap<String,String> getStemDict(File wordstemfile) throws IOException
word\tstem(i.e. two tab separated words)
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |