Class SpellChecker

java.lang.Object
org.apache.lucene.search.spell.SpellChecker
All Implemented Interfaces:
Closeable, AutoCloseable

public class SpellChecker extends Object implements Closeable
Spell Checker class (Main class).
(initially inspired by the David Spencer code).

Example Usage:

  SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
  // To index a field of a user index:
  spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
  // To index a file containing words:
  spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
  String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
 
  • Field Details

  • Constructor Details

    • SpellChecker

      public SpellChecker(Directory spellIndex, StringDistance sd) throws IOException
      Use the given directory as a spell checker index. The directory is created if it doesn't exist yet.
      Parameters:
      spellIndex - the spell index directory
      sd - the StringDistance measurement to use
      Throws:
      IOException - if Spellchecker can not open the directory
    • SpellChecker

      public SpellChecker(Directory spellIndex) throws IOException
      Use the given directory as a spell checker index with a LevenshteinDistance as the default StringDistance. The directory is created if it doesn't exist yet.
      Parameters:
      spellIndex - the spell index directory
      Throws:
      IOException - if spellchecker can not open the directory
    • SpellChecker

      public SpellChecker(Directory spellIndex, StringDistance sd, Comparator<SuggestWord> comparator) throws IOException
      Use the given directory as a spell checker index with the given StringDistance measure and the given Comparator for sorting the results.
      Parameters:
      spellIndex - The spelling index
      sd - The distance
      comparator - The comparator
      Throws:
      IOException - if there is a problem opening the index
  • Method Details

    • setSpellIndex

      public void setSpellIndex(Directory spellIndexDir) throws IOException
      Use a different index as the spell checker index or re-open the existing index if spellIndex is the same value as given in the constructor.
      Parameters:
      spellIndexDir - the spell directory to use
      Throws:
      AlreadyClosedException - if the Spellchecker is already closed
      IOException - if spellchecker can not open the directory
    • setComparator

      public void setComparator(Comparator<SuggestWord> comparator)
      Sets the Comparator for the SuggestWordQueue.
      Parameters:
      comparator - the comparator
    • getComparator

      public Comparator<SuggestWord> getComparator()
      Gets the comparator in use for ranking suggestions.
      See Also:
    • setStringDistance

      public void setStringDistance(StringDistance sd)
      Sets the StringDistance implementation for this SpellChecker instance.
      Parameters:
      sd - the StringDistance implementation for this SpellChecker instance
    • getStringDistance

      public StringDistance getStringDistance()
      Returns the StringDistance instance used by this SpellChecker instance.
      Returns:
      the StringDistance instance used by this SpellChecker instance.
    • setAccuracy

      public void setAccuracy(float acc)
      Sets the accuracy 0 < minScore < 1; default DEFAULT_ACCURACY
      Parameters:
      acc - The new accuracy
    • getAccuracy

      public float getAccuracy()
      The accuracy (minimum score) to be used, unless overridden in suggestSimilar(String, int, IndexReader, String, SuggestMode, float), to decide whether a suggestion is included or not.
      Returns:
      The current accuracy setting
    • suggestSimilar

      public String[] suggestSimilar(String word, int numSug) throws IOException
      Suggest similar words.

      As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

      I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

      Parameters:
      word - the word you want a spell check done on
      numSug - the number of suggested words
      Returns:
      String[]
      Throws:
      IOException - if the underlying index throws an IOException
      AlreadyClosedException - if the Spellchecker is already closed
      See Also:
    • suggestSimilar

      public String[] suggestSimilar(String word, int numSug, float accuracy) throws IOException
      Suggest similar words.

      As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

      I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

      Parameters:
      word - the word you want a spell check done on
      numSug - the number of suggested words
      accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
      Returns:
      String[]
      Throws:
      IOException - if the underlying index throws an IOException
      AlreadyClosedException - if the Spellchecker is already closed
      See Also:
    • suggestSimilar

      public String[] suggestSimilar(String word, int numSug, IndexReader ir, String field, SuggestMode suggestMode) throws IOException
      Throws:
      IOException
    • suggestSimilar

      public String[] suggestSimilar(String word, int numSug, IndexReader ir, String field, SuggestMode suggestMode, float accuracy) throws IOException
      Suggest similar words (optionally restricted to a field of an index).

      As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

      I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

      Parameters:
      word - the word you want a spell check done on
      numSug - the number of suggested words
      ir - the indexReader of the user index (can be null see field param)
      field - the field of the user index: if field is not null, the suggested words are restricted to the words present in this field.
      suggestMode - (NOTE: if indexReader==null and/or field==null, then this is overridden with SuggestMode.SUGGEST_ALWAYS)
      accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
      Returns:
      String[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index
      Throws:
      IOException - if the underlying index throws an IOException
      AlreadyClosedException - if the Spellchecker is already closed
    • clearIndex

      public void clearIndex() throws IOException
      Removes all terms from the spell check index.
      Throws:
      IOException - If there is a low-level I/O error.
      AlreadyClosedException - if the Spellchecker is already closed
    • exist

      public boolean exist(String word) throws IOException
      Check whether the word exists in the index.
      Parameters:
      word - word to check
      Returns:
      true if the word exists in the index
      Throws:
      IOException - If there is a low-level I/O error.
      AlreadyClosedException - if the Spellchecker is already closed
    • indexDictionary

      public final void indexDictionary(Dictionary dict, IndexWriterConfig config, boolean fullMerge) throws IOException
      Indexes the data from the given Dictionary.
      Parameters:
      dict - Dictionary to index
      config - IndexWriterConfig to use
      fullMerge - whether or not the spellcheck index should be fully merged
      Throws:
      AlreadyClosedException - if the Spellchecker is already closed
      IOException - If there is a low-level I/O error.
    • close

      public void close() throws IOException
      Close the IndexSearcher used by this SpellChecker
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException - if the close operation causes an IOException
      AlreadyClosedException - if the SpellChecker is already closed