|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.search.spell.SpellChecker
public class SpellChecker
Spell Checker class (Main class)
(initially inspired by the David Spencer code).
Example Usage:
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory); // To index a field of a user index: spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field)); // To index a file containing words: spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt"))); String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
Field Summary | |
---|---|
static String |
F_WORD
Field name for each word in the ngram index. |
Constructor Summary | |
---|---|
SpellChecker(org.apache.lucene.store.Directory spellIndex)
Use the given directory as a spell checker index with a LevensteinDistance as the default StringDistance . |
|
SpellChecker(org.apache.lucene.store.Directory spellIndex,
StringDistance sd)
Use the given directory as a spell checker index. |
Method Summary | |
---|---|
void |
clearIndex()
Removes all terms from the spell check index. |
void |
close()
Close the IndexSearcher used by this SpellChecker |
boolean |
exist(String word)
Check whether the word exists in the index. |
StringDistance |
getStringDistance()
Returns the StringDistance instance used by this
SpellChecker instance. |
void |
indexDictionary(Dictionary dict)
Indexes the data from the given Dictionary . |
void |
indexDictionary(Dictionary dict,
int mergeFactor,
int ramMB)
Indexes the data from the given Dictionary . |
void |
setAccuracy(float minScore)
Sets the accuracy 0 < minScore < 1; default 0.5 |
void |
setSpellIndex(org.apache.lucene.store.Directory spellIndexDir)
Use a different index as the spell checker index or re-open the existing index if spellIndex is the same value
as given in the constructor. |
void |
setStringDistance(StringDistance sd)
Sets the StringDistance implementation for this
SpellChecker instance. |
String[] |
suggestSimilar(String word,
int numSug)
Suggest similar words. |
String[] |
suggestSimilar(String word,
int numSug,
org.apache.lucene.index.IndexReader ir,
String field,
boolean morePopular)
Suggest similar words (optionally restricted to a field of an index). |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String F_WORD
Constructor Detail |
---|
public SpellChecker(org.apache.lucene.store.Directory spellIndex, StringDistance sd) throws IOException
spellIndex
- the spell index directorysd
- the StringDistance
measurement to use
IOException
- if Spellchecker can not open the directorypublic SpellChecker(org.apache.lucene.store.Directory spellIndex) throws IOException
LevensteinDistance
as the default StringDistance
. The
directory is created if it doesn't exist yet.
spellIndex
- the spell index directory
IOException
- if spellchecker can not open the directoryMethod Detail |
---|
public void setSpellIndex(org.apache.lucene.store.Directory spellIndexDir) throws IOException
spellIndex
is the same value
as given in the constructor.
spellIndexDir
- the spell directory to use
org.apache.lucene.store.AlreadyClosedException
- if the Spellchecker is already closed
IOException
- if spellchecker can not open the directorypublic void setStringDistance(StringDistance sd)
StringDistance
implementation for this
SpellChecker
instance.
sd
- the StringDistance
implementation for this
SpellChecker
instancepublic StringDistance getStringDistance()
StringDistance
instance used by this
SpellChecker
instance.
StringDistance
instance used by this
SpellChecker
instance.public void setAccuracy(float minScore)
public String[] suggestSimilar(String word, int numSug) throws IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
word
- the word you want a spell check done onnumSug
- the number of suggested words
IOException
- if the underlying index throws an IOException
org.apache.lucene.store.AlreadyClosedException
- if the Spellchecker is already closedpublic String[] suggestSimilar(String word, int numSug, org.apache.lucene.index.IndexReader ir, String field, boolean morePopular) throws IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
word
- the word you want a spell check done onnumSug
- the number of suggested wordsir
- the indexReader of the user index (can be null see field param)field
- the field of the user index: if field is not null, the suggested
words are restricted to the words present in this field.morePopular
- return only the suggest words that are as frequent or more frequent than the searched word
(only if restricted mode = (indexReader!=null and field!=null)
IOException
- if the underlying index throws an IOException
org.apache.lucene.store.AlreadyClosedException
- if the Spellchecker is already closedpublic void clearIndex() throws IOException
IOException
org.apache.lucene.store.AlreadyClosedException
- if the Spellchecker is already closedpublic boolean exist(String word) throws IOException
word
-
IOException
org.apache.lucene.store.AlreadyClosedException
- if the Spellchecker is already closedpublic void indexDictionary(Dictionary dict, int mergeFactor, int ramMB) throws IOException
Dictionary
.
dict
- Dictionary to indexmergeFactor
- mergeFactor to use when indexingramMB
- the max amount or memory in MB to use
org.apache.lucene.store.AlreadyClosedException
- if the Spellchecker is already closed
IOException
public void indexDictionary(Dictionary dict) throws IOException
Dictionary
.
dict
- the dictionary to index
IOException
public void close() throws IOException
close
in interface Closeable
IOException
- if the close operation causes an IOException
org.apache.lucene.store.AlreadyClosedException
- if the SpellChecker
is already closed
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |