SpellChecker (Lucene 3.1.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.search.spell
Class SpellChecker

java.lang.Object
  org.apache.lucene.search.spell.SpellChecker

All Implemented Interfaces:: Closeable

public class SpellChecker
extends Object
implements Closeable
extends Object
implements Closeable

Spell Checker class (Main class)
(initially inspired by the David Spencer code).

Example Usage:

  SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
  // To index a field of a user index:
  spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
  // To index a file containing words:
  spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
  String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);

Version:: 1.0

Field Summary
`static float`	`DEFAULT_ACCURACY` The default minimum score to use, if not specified by calling `setAccuracy(float)` .
`static String`	`F_WORD` Field name for each word in the ngram index.

Constructor Summary
`SpellChecker(Directory spellIndex)` Use the given directory as a spell checker index with a `LevensteinDistance` as the default `StringDistance`.
`SpellChecker(Directory spellIndex, StringDistance sd)` Use the given directory as a spell checker index.
`SpellChecker(Directory spellIndex, StringDistance sd, Comparator<SuggestWord> comparator)` Use the given directory as a spell checker index with the given `StringDistance` measure and the given `Comparator` for sorting the results.

Method Summary
`void`	`clearIndex()` Removes all terms from the spell check index.
`void`	`close()` Close the IndexSearcher used by this SpellChecker
`boolean`	`exist(String word)` Check whether the word exists in the index.
`float`	`getAccuracy()` The accuracy (minimum score) to be used, unless overridden in `suggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float)`, to decide whether a suggestion is included or not.
`Comparator<SuggestWord>`	`getComparator()`
`StringDistance`	`getStringDistance()` Returns the `StringDistance` instance used by this `SpellChecker` instance.
`void`	`indexDictionary(Dictionary dict)` Indexes the data from the given `Dictionary`.
`void`	`indexDictionary(Dictionary dict, int mergeFactor, int ramMB)` Indexes the data from the given `Dictionary`.
`void`	`indexDictionary(Dictionary dict, int mergeFactor, int ramMB, boolean optimize)` Indexes the data from the given `Dictionary`.
`void`	`setAccuracy(float acc)` Sets the accuracy 0 < minScore < 1; default `DEFAULT_ACCURACY`
`void`	`setComparator(Comparator<SuggestWord> comparator)` Sets the `Comparator` for the `SuggestWordQueue`.
`void`	`setSpellIndex(Directory spellIndexDir)` Use a different index as the spell checker index or re-open the existing index if `spellIndex` is the same value as given in the constructor.
`void`	`setStringDistance(StringDistance sd)` Sets the `StringDistance` implementation for this `SpellChecker` instance.
`String[]`	`suggestSimilar(String word, int numSug)` Suggest similar words.
`String[]`	`suggestSimilar(String word, int numSug, float accuracy)` Suggest similar words.
`String[]`	`suggestSimilar(String word, int numSug, IndexReader ir, String field, boolean morePopular)` Suggest similar words (optionally restricted to a field of an index).
`String[]`	`suggestSimilar(String word, int numSug, IndexReader ir, String field, boolean morePopular, float accuracy)` Suggest similar words (optionally restricted to a field of an index).

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

DEFAULT_ACCURACY

public static final float DEFAULT_ACCURACY

The default minimum score to use, if not specified by calling setAccuracy(float) .

See Also:: Constant Field Values

F_WORD

public static final String F_WORD

Field name for each word in the ngram index.

See Also:: Constant Field Values

Constructor Detail

SpellChecker

public SpellChecker(Directory spellIndex,
                    StringDistance sd)
             throws IOException

Use the given directory as a spell checker index. The directory is created if it doesn't exist yet.

Parameters:: spellIndex - the spell index directory; sd - the StringDistance measurement to use
Throws:: IOException - if Spellchecker can not open the directory

SpellChecker

public SpellChecker(Directory spellIndex)
             throws IOException

Use the given directory as a spell checker index with a LevensteinDistance as the default StringDistance. The directory is created if it doesn't exist yet.

Parameters:: spellIndex - the spell index directory
Throws:: IOException - if spellchecker can not open the directory

SpellChecker

public SpellChecker(Directory spellIndex,
                    StringDistance sd,
                    Comparator<SuggestWord> comparator)
             throws IOException

Use the given directory as a spell checker index with the given StringDistance measure and the given Comparator for sorting the results.

Parameters:: spellIndex - The spelling index; sd - The distance; comparator - The comparator
Throws:: IOException - if there is a problem opening the index

Method Detail

setSpellIndex

public void setSpellIndex(Directory spellIndexDir)
                   throws IOException

Use a different index as the spell checker index or re-open the existing index if spellIndex is the same value as given in the constructor.

Parameters:: spellIndexDir - the spell directory to use
Throws:: AlreadyClosedException - if the Spellchecker is already closed; IOException - if spellchecker can not open the directory

setComparator

public void setComparator(Comparator<SuggestWord> comparator)

Sets the Comparator for the SuggestWordQueue.

Parameters:: comparator - the comparator

getComparator

public Comparator<SuggestWord> getComparator()

setStringDistance

public void setStringDistance(StringDistance sd)

Sets the StringDistance implementation for this SpellChecker instance.

Parameters:: sd - the StringDistance implementation for this SpellChecker instance

getStringDistance

public StringDistance getStringDistance()

Returns the StringDistance instance used by this SpellChecker instance.

Returns:: the StringDistance instance used by this SpellChecker instance.

setAccuracy

public void setAccuracy(float acc)

Sets the accuracy 0 < minScore < 1; default DEFAULT_ACCURACY

Parameters:: acc - The new accuracy

getAccuracy

public float getAccuracy()

The accuracy (minimum score) to be used, unless overridden in suggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float), to decide whether a suggestion is included or not.

Returns:: The current accuracy setting

suggestSimilar

public String[] suggestSimilar(String word,
                               int numSug)
                        throws IOException

Suggest similar words.

As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

Parameters:: word - the word you want a spell check done on; numSug - the number of suggested words
Returns:: String[]
Throws:: IOException - if the underlying index throws an IOException; AlreadyClosedException - if the Spellchecker is already closed
See Also:: suggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float)

suggestSimilar

public String[] suggestSimilar(String word,
                               int numSug,
                               float accuracy)
                        throws IOException

Suggest similar words.

I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

Parameters:: word - the word you want a spell check done on; numSug - the number of suggested words; accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
Returns:: String[]
Throws:: IOException - if the underlying index throws an IOException; AlreadyClosedException - if the Spellchecker is already closed
See Also:: suggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float)

suggestSimilar

public String[] suggestSimilar(String word,
                               int numSug,
                               IndexReader ir,
                               String field,
                               boolean morePopular)
                        throws IOException

Suggest similar words (optionally restricted to a field of an index).

I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

Uses the getAccuracy() value passed into the constructor as the accuracy.

Parameters:: word - the word you want a spell check done on; numSug - the number of suggested words; ir - the indexReader of the user index (can be null see field param); field - the field of the user index: if field is not null, the suggested words are restricted to the words present in this field.; morePopular - return only the suggest words that are as frequent or more frequent than the searched word (only if restricted mode = (indexReader!=null and field!=null)
Returns:: String[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index
Throws:: IOException - if the underlying index throws an IOException; AlreadyClosedException - if the Spellchecker is already closed
See Also:: suggestSimilar(String, int, org.apache.lucene.index.IndexReader, String, boolean, float)

suggestSimilar

public String[] suggestSimilar(String word,
                               int numSug,
                               IndexReader ir,
                               String field,
                               boolean morePopular,
                               float accuracy)
                        throws IOException

Suggest similar words (optionally restricted to a field of an index).

I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

Parameters:: word - the word you want a spell check done on; numSug - the number of suggested words; ir - the indexReader of the user index (can be null see field param); field - the field of the user index: if field is not null, the suggested words are restricted to the words present in this field.; morePopular - return only the suggest words that are as frequent or more frequent than the searched word (only if restricted mode = (indexReader!=null and field!=null); accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
Returns:: String[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index
Throws:: IOException - if the underlying index throws an IOException; AlreadyClosedException - if the Spellchecker is already closed

clearIndex

public void clearIndex()
                throws IOException

Removes all terms from the spell check index.

Throws:: IOException; AlreadyClosedException - if the Spellchecker is already closed

exist

public boolean exist(String word)
              throws IOException

Check whether the word exists in the index.

Parameters:: word -
Returns:: true if the word exists in the index
Throws:: IOException; AlreadyClosedException - if the Spellchecker is already closed

indexDictionary

public final void indexDictionary(Dictionary dict,
                                  int mergeFactor,
                                  int ramMB,
                                  boolean optimize)
                           throws IOException

Indexes the data from the given Dictionary.

Parameters:: dict - Dictionary to index; mergeFactor - mergeFactor to use when indexing; ramMB - the max amount or memory in MB to use; optimize - whether or not the spellcheck index should be optimized
Throws:: AlreadyClosedException - if the Spellchecker is already closed; IOException

indexDictionary

public final void indexDictionary(Dictionary dict,
                                  int mergeFactor,
                                  int ramMB)
                           throws IOException

Indexes the data from the given Dictionary.

Parameters:: dict - the dictionary to index; mergeFactor - mergeFactor to use when indexing; ramMB - the max amount or memory in MB to use
Throws:: IOException

indexDictionary

public final void indexDictionary(Dictionary dict)
                           throws IOException

Indexes the data from the given Dictionary.

Parameters:: dict - the dictionary to index
Throws:: IOException

close

public void close()
           throws IOException

Close the IndexSearcher used by this SpellChecker

Specified by:: close in interface Closeable

Throws:: IOException - if the close operation causes an IOException; AlreadyClosedException - if the SpellChecker is already closed

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.search.spell Class SpellChecker

DEFAULT_ACCURACY

F_WORD

SpellChecker

SpellChecker

SpellChecker

setSpellIndex

setComparator

getComparator

setStringDistance

getStringDistance

setAccuracy

getAccuracy

suggestSimilar

suggestSimilar

suggestSimilar

suggestSimilar

clearIndex

exist

indexDictionary

indexDictionary

indexDictionary

close

org.apache.lucene.search.spell
Class SpellChecker