Class SpellChecker

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public class SpellChecker
    extends Object
    implements Closeable
    Spell Checker class (Main class).
    (initially inspired by the David Spencer code).

    Example Usage:

      SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
      // To index a field of a user index:
      spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
      // To index a file containing words:
      spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
      String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
     
    • Constructor Detail

      • SpellChecker

        public SpellChecker​(Directory spellIndex,
                            StringDistance sd)
                     throws IOException
        Use the given directory as a spell checker index. The directory is created if it doesn't exist yet.
        Parameters:
        spellIndex - the spell index directory
        sd - the StringDistance measurement to use
        Throws:
        IOException - if Spellchecker can not open the directory
      • SpellChecker

        public SpellChecker​(Directory spellIndex)
                     throws IOException
        Use the given directory as a spell checker index with a LevenshteinDistance as the default StringDistance. The directory is created if it doesn't exist yet.
        Parameters:
        spellIndex - the spell index directory
        Throws:
        IOException - if spellchecker can not open the directory
    • Method Detail

      • setSpellIndex

        public void setSpellIndex​(Directory spellIndexDir)
                           throws IOException
        Use a different index as the spell checker index or re-open the existing index if spellIndex is the same value as given in the constructor.
        Parameters:
        spellIndexDir - the spell directory to use
        Throws:
        AlreadyClosedException - if the Spellchecker is already closed
        IOException - if spellchecker can not open the directory
      • setAccuracy

        public void setAccuracy​(float acc)
        Sets the accuracy 0 < minScore < 1; default DEFAULT_ACCURACY
        Parameters:
        acc - The new accuracy
      • suggestSimilar

        public String[] suggestSimilar​(String word,
                                       int numSug)
                                throws IOException
        Suggest similar words.

        As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

        I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

        Parameters:
        word - the word you want a spell check done on
        numSug - the number of suggested words
        Returns:
        String[]
        Throws:
        IOException - if the underlying index throws an IOException
        AlreadyClosedException - if the Spellchecker is already closed
        See Also:
        suggestSimilar(String, int, IndexReader, String, SuggestMode, float)
      • suggestSimilar

        public String[] suggestSimilar​(String word,
                                       int numSug,
                                       float accuracy)
                                throws IOException
        Suggest similar words.

        As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

        I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

        Parameters:
        word - the word you want a spell check done on
        numSug - the number of suggested words
        accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
        Returns:
        String[]
        Throws:
        IOException - if the underlying index throws an IOException
        AlreadyClosedException - if the Spellchecker is already closed
        See Also:
        suggestSimilar(String, int, IndexReader, String, SuggestMode, float)
      • suggestSimilar

        public String[] suggestSimilar​(String word,
                                       int numSug,
                                       IndexReader ir,
                                       String field,
                                       SuggestMode suggestMode,
                                       float accuracy)
                                throws IOException
        Suggest similar words (optionally restricted to a field of an index).

        As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.

        I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.

        Parameters:
        word - the word you want a spell check done on
        numSug - the number of suggested words
        ir - the indexReader of the user index (can be null see field param)
        field - the field of the user index: if field is not null, the suggested words are restricted to the words present in this field.
        suggestMode - (NOTE: if indexReader==null and/or field==null, then this is overridden with SuggestMode.SUGGEST_ALWAYS)
        accuracy - The minimum score a suggestion must have in order to qualify for inclusion in the results
        Returns:
        String[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index
        Throws:
        IOException - if the underlying index throws an IOException
        AlreadyClosedException - if the Spellchecker is already closed
      • clearIndex

        public void clearIndex()
                        throws IOException
        Removes all terms from the spell check index.
        Throws:
        IOException - If there is a low-level I/O error.
        AlreadyClosedException - if the Spellchecker is already closed
      • exist

        public boolean exist​(String word)
                      throws IOException
        Check whether the word exists in the index.
        Parameters:
        word - word to check
        Returns:
        true if the word exists in the index
        Throws:
        IOException - If there is a low-level I/O error.
        AlreadyClosedException - if the Spellchecker is already closed