|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.lucene.search.spell.SpellChecker
public class SpellChecker
Spell Checker class (Main class)
(initially inspired by the David Spencer code).
Example Usage:
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
// To index a field of a user index:
spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
// To index a file containing words:
spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
| Field Summary | |
|---|---|
static float |
DEFAULT_ACCURACY
The default minimum score to use, if not specified by calling setAccuracy(float) . |
static String |
F_WORD
Field name for each word in the ngram index. |
| Constructor Summary | |
|---|---|
SpellChecker(Directory spellIndex)
Use the given directory as a spell checker index with a LevensteinDistance as the default StringDistance. |
|
SpellChecker(Directory spellIndex,
StringDistance sd)
Use the given directory as a spell checker index. |
|
SpellChecker(Directory spellIndex,
StringDistance sd,
Comparator<SuggestWord> comparator)
Use the given directory as a spell checker index with the given StringDistance measure
and the given Comparator for sorting the results. |
|
| Method Summary | |
|---|---|
void |
clearIndex()
Removes all terms from the spell check index. |
void |
close()
Close the IndexSearcher used by this SpellChecker |
boolean |
exist(String word)
Check whether the word exists in the index. |
float |
getAccuracy()
The accuracy (minimum score) to be used, unless overridden in suggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float), to
decide whether a suggestion is included or not. |
Comparator<SuggestWord> |
getComparator()
|
StringDistance |
getStringDistance()
Returns the StringDistance instance used by this
SpellChecker instance. |
void |
indexDictionary(Dictionary dict)
Indexes the data from the given Dictionary. |
void |
indexDictionary(Dictionary dict,
int mergeFactor,
int ramMB)
Indexes the data from the given Dictionary. |
void |
indexDictionary(Dictionary dict,
int mergeFactor,
int ramMB,
boolean optimize)
Indexes the data from the given Dictionary. |
void |
setAccuracy(float acc)
Sets the accuracy 0 < minScore < 1; default DEFAULT_ACCURACY |
void |
setComparator(Comparator<SuggestWord> comparator)
Sets the Comparator for the SuggestWordQueue. |
void |
setSpellIndex(Directory spellIndexDir)
Use a different index as the spell checker index or re-open the existing index if spellIndex is the same value
as given in the constructor. |
void |
setStringDistance(StringDistance sd)
Sets the StringDistance implementation for this
SpellChecker instance. |
String[] |
suggestSimilar(String word,
int numSug)
Suggest similar words. |
String[] |
suggestSimilar(String word,
int numSug,
float accuracy)
Suggest similar words. |
String[] |
suggestSimilar(String word,
int numSug,
IndexReader ir,
String field,
boolean morePopular)
Suggest similar words (optionally restricted to a field of an index). |
String[] |
suggestSimilar(String word,
int numSug,
IndexReader ir,
String field,
boolean morePopular,
float accuracy)
Suggest similar words (optionally restricted to a field of an index). |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final float DEFAULT_ACCURACY
setAccuracy(float) .
public static final String F_WORD
| Constructor Detail |
|---|
public SpellChecker(Directory spellIndex,
StringDistance sd)
throws IOException
spellIndex - the spell index directorysd - the StringDistance measurement to use
IOException - if Spellchecker can not open the directory
public SpellChecker(Directory spellIndex)
throws IOException
LevensteinDistance as the default StringDistance. The
directory is created if it doesn't exist yet.
spellIndex - the spell index directory
IOException - if spellchecker can not open the directory
public SpellChecker(Directory spellIndex,
StringDistance sd,
Comparator<SuggestWord> comparator)
throws IOException
StringDistance measure
and the given Comparator for sorting the results.
spellIndex - The spelling indexsd - The distancecomparator - The comparator
IOException - if there is a problem opening the index| Method Detail |
|---|
public void setSpellIndex(Directory spellIndexDir)
throws IOException
spellIndex is the same value
as given in the constructor.
spellIndexDir - the spell directory to use
AlreadyClosedException - if the Spellchecker is already closed
IOException - if spellchecker can not open the directorypublic void setComparator(Comparator<SuggestWord> comparator)
Comparator for the SuggestWordQueue.
comparator - the comparatorpublic Comparator<SuggestWord> getComparator()
public void setStringDistance(StringDistance sd)
StringDistance implementation for this
SpellChecker instance.
sd - the StringDistance implementation for this
SpellChecker instancepublic StringDistance getStringDistance()
StringDistance instance used by this
SpellChecker instance.
StringDistance instance used by this
SpellChecker instance.public void setAccuracy(float acc)
DEFAULT_ACCURACY
acc - The new accuracypublic float getAccuracy()
suggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float), to
decide whether a suggestion is included or not.
public String[] suggestSimilar(String word,
int numSug)
throws IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
word - the word you want a spell check done onnumSug - the number of suggested words
IOException - if the underlying index throws an IOException
AlreadyClosedException - if the Spellchecker is already closedsuggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float)
public String[] suggestSimilar(String word,
int numSug,
float accuracy)
throws IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
word - the word you want a spell check done onnumSug - the number of suggested wordsaccuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
IOException - if the underlying index throws an IOException
AlreadyClosedException - if the Spellchecker is already closedsuggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float)
public String[] suggestSimilar(String word,
int numSug,
IndexReader ir,
String field,
boolean morePopular)
throws IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
Uses the getAccuracy() value passed into the constructor as the accuracy.
word - the word you want a spell check done onnumSug - the number of suggested wordsir - the indexReader of the user index (can be null see field param)field - the field of the user index: if field is not null, the suggested
words are restricted to the words present in this field.morePopular - return only the suggest words that are as frequent or more frequent than the searched word
(only if restricted mode = (indexReader!=null and field!=null)
IOException - if the underlying index throws an IOException
AlreadyClosedException - if the Spellchecker is already closedsuggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float)
public String[] suggestSimilar(String word,
int numSug,
IndexReader ir,
String field,
boolean morePopular,
float accuracy)
throws IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
word - the word you want a spell check done onnumSug - the number of suggested wordsir - the indexReader of the user index (can be null see field param)field - the field of the user index: if field is not null, the suggested
words are restricted to the words present in this field.morePopular - return only the suggest words that are as frequent or more frequent than the searched word
(only if restricted mode = (indexReader!=null and field!=null)accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
IOException - if the underlying index throws an IOException
AlreadyClosedException - if the Spellchecker is already closed
public void clearIndex()
throws IOException
IOException
AlreadyClosedException - if the Spellchecker is already closed
public boolean exist(String word)
throws IOException
word -
IOException
AlreadyClosedException - if the Spellchecker is already closed
public final void indexDictionary(Dictionary dict,
int mergeFactor,
int ramMB,
boolean optimize)
throws IOException
Dictionary.
dict - Dictionary to indexmergeFactor - mergeFactor to use when indexingramMB - the max amount or memory in MB to useoptimize - whether or not the spellcheck index should be optimized
AlreadyClosedException - if the Spellchecker is already closed
IOException
public final void indexDictionary(Dictionary dict,
int mergeFactor,
int ramMB)
throws IOException
Dictionary.
dict - the dictionary to indexmergeFactor - mergeFactor to use when indexingramMB - the max amount or memory in MB to use
IOException
public final void indexDictionary(Dictionary dict)
throws IOException
Dictionary.
dict - the dictionary to index
IOException
public void close()
throws IOException
close in interface CloseableIOException - if the close operation causes an IOException
AlreadyClosedException - if the SpellChecker is already closed
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||