WordlistLoader (Lucene 3.0.3 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis
Class WordlistLoader

java.lang.Object
  org.apache.lucene.analysis.WordlistLoader

public class WordlistLoader
extends Object
extends Object

Loader for text files that represent a list of stopwords.

Constructor Summary
`WordlistLoader()`

Method Summary
`static HashMap<String,String>`	`getStemDict(File wordstemfile)` Reads a stem dictionary.
`static HashSet<String>`	`getWordSet(File wordfile)` Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).
`static HashSet<String>`	`getWordSet(File wordfile, String comment)` Loads a text file and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace).
`static HashSet<String>`	`getWordSet(Reader reader)` Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).
`static HashSet<String>`	`getWordSet(Reader reader, String comment)` Reads lines from a Reader and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace).

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

WordlistLoader

public WordlistLoader()

Method Detail

getWordSet

public static HashSet<String> getWordSet(File wordfile)
                                  throws IOException

Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters:: wordfile - File containing the wordlist
Returns:: A HashSet with the file's words
Throws:: IOException

getWordSet

public static HashSet<String> getWordSet(File wordfile,
                                         String comment)
                                  throws IOException

Loads a text file and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters:: wordfile - File containing the wordlist; comment - The comment string to ignore
Returns:: A HashSet with the file's words
Throws:: IOException

getWordSet

public static HashSet<String> getWordSet(Reader reader)
                                  throws IOException

Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters:: reader - Reader containing the wordlist
Returns:: A HashSet with the reader's words
Throws:: IOException

getWordSet

public static HashSet<String> getWordSet(Reader reader,
                                         String comment)
                                  throws IOException

Reads lines from a Reader and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters:: reader - Reader containing the wordlist; comment - The string representing a comment.
Returns:: A HashSet with the reader's words
Throws:: IOException

getStemDict

public static HashMap<String,String> getStemDict(File wordstemfile)
                                          throws IOException

Reads a stem dictionary. Each line contains:

word\tstem

(i.e. two tab seperated words)

Returns:: stem dictionary that overrules the stemming algorithm
Throws:: IOException

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis Class WordlistLoader

WordlistLoader

getWordSet

getWordSet

getWordSet

getWordSet

getStemDict

org.apache.lucene.analysis
Class WordlistLoader