Class NGramFragmentChecker

    • Method Detail

      • hasImpossibleFragmentAround

        public boolean hasImpossibleFragmentAround​(CharSequence word,
                                                   int start,
                                                   int end)
        Description copied from interface: FragmentChecker
        Check if the given word range intersects any fragment which is impossible in the current language. For example, if the word is "aaax", and there are no "aaa" combinations in words accepted by the spellchecker (but "aax" is valid), then true can be returned for all ranges in 0..3, but not for 3..4.

        The implementation must be monotonic: if some range is considered impossible, larger ranges encompassing it should also produce true.

        Specified by:
        hasImpossibleFragmentAround in interface FragmentChecker
        Parameters:
        word - the whole word being checked for impossible substrings
        start - the start of the range in question, inclusive
        end - the end of the range in question, inclusive, not smaller than start
      • fromAllSimpleWords

        public static NGramFragmentChecker fromAllSimpleWords​(int n,
                                                              Dictionary dictionary,
                                                              Runnable checkCanceled)
        Iterate the whole dictionary, derive all word forms (using WordFormGenerator), vary the case to get all words acceptable by the spellchecker, and create a fragment checker based on their n-grams. Note that this enumerates only words derivable by suffixes and prefixes. If the language has compounds, some n-grams possible via those compounds can be missed. In the latter case, consider using fromWords(int, java.util.Collection<? extends java.lang.CharSequence>).
        Parameters:
        n - the length of n-grams
        dictionary - the dictionary to traverse
        checkCanceled - an object that's periodically called, allowing to interrupt the traversal by throwing an exception
      • fromWords

        public static NGramFragmentChecker fromWords​(int n,
                                                     Collection<? extends CharSequence> words)
        Create a fragment checker for n-grams found in the given words. The words can be n-grams themselves or full words of the language. The words are case-sensitive, so be sure to include upper-case and title-case variants if they're accepted by the spellchecker.
        Parameters:
        n - the length of the ngrams to consider.
        words - the strings to extract n-grams from
      • processNGrams

        public static void processNGrams​(int n,
                                         Dictionary dictionary,
                                         Runnable checkCanceled,
                                         NGramFragmentChecker.NGramConsumer consumer)
        Traverse the whole dictionary, generate all word forms of its entries, and process all n-grams in these word forms. No duplication removal is done, so the consumer should be prepared to duplicate n-grams. The traversal order is undefined.
        Parameters:
        n - the length of the n-grams
        dictionary - the dictionary to traverse
        checkCanceled - an object that's periodically called, allowing to interrupt the traversal by throwing an exception
        consumer - the n-gram consumer to be called for each n-gram