Class WordBreakSpellChecker

java.lang.Object
org.apache.lucene.search.spell.WordBreakSpellChecker

public class WordBreakSpellChecker extends Object
A spell checker whose sole function is to offer suggestions by combining multiple terms into one word and/or breaking terms into multiple words.
  • Field Details

    • SEPARATOR_TERM

      public static final Term SEPARATOR_TERM
      Term that can be used to prohibit adjacent terms from being combined
  • Constructor Details

  • Method Details

    • suggestWordBreaks

      public SuggestWord[][] suggestWordBreaks(Term term, int maxSuggestions, IndexReader ir, SuggestMode suggestMode, WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod) throws IOException
      Generate suggestions by breaking the passed-in term into multiple words. The scores returned are equal to the number of word breaks needed so a lower score is generally preferred over a higher score.
      Parameters:
      suggestMode - - default = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX
      sortMethod - - default = WordBreakSpellChecker.BreakSuggestionSortMethod.NUM_CHANGES_THEN_MAX_FREQUENCY
      Returns:
      one or more arrays of words formed by breaking up the original term
      Throws:
      IOException - If there is a low-level I/O error.
    • suggestWordCombinations

      public CombineSuggestion[] suggestWordCombinations(Term[] terms, int maxSuggestions, IndexReader ir, SuggestMode suggestMode) throws IOException
      Generate suggestions by combining one or more of the passed-in terms into single words. The returned CombineSuggestion contains both a SuggestWord and also an array detailing which passed-in terms were involved in creating this combination. The scores returned are equal to the number of word combinations needed, also one less than the length of the array CombineSuggestion.originalTermIndexes. Generally, a suggestion with a lower score is preferred over a higher score.

      To prevent two adjacent terms from being combined (for instance, if one is mandatory and the other is prohibited), separate the two terms with SEPARATOR_TERM

      When suggestMode equals SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX, each suggestion will include at least one term not in the index.

      When suggestMode equals SuggestMode.SUGGEST_MORE_POPULAR, each suggestion will have the same, or better frequency than the most-popular included term.

      Returns:
      an array of words generated by combining original terms
      Throws:
      IOException - If there is a low-level I/O error.
    • getMinSuggestionFrequency

      public int getMinSuggestionFrequency()
      Returns the minimum frequency a term must have to be part of a suggestion.
      See Also:
    • getMaxCombineWordLength

      public int getMaxCombineWordLength()
      Returns the maximum length of a combined suggestion
      See Also:
    • getMinBreakWordLength

      public int getMinBreakWordLength()
      Returns the minimum size of a broken word
      See Also:
    • getMaxChanges

      public int getMaxChanges()
      Returns the maximum number of changes to perform on the input
      See Also:
    • getMaxEvaluations

      public int getMaxEvaluations()
      Returns the maximum number of word combinations to evaluate.
      See Also:
    • setMinSuggestionFrequency

      public void setMinSuggestionFrequency(int minSuggestionFrequency)
      The minimum frequency a term must have to be included as part of a suggestion. Default=1 Not applicable when used with SuggestMode.SUGGEST_MORE_POPULAR
      See Also:
    • setMaxCombineWordLength

      public void setMaxCombineWordLength(int maxCombineWordLength)
      The maximum length of a suggestion made by combining 1 or more original terms. Default=20
      See Also:
    • setMinBreakWordLength

      public void setMinBreakWordLength(int minBreakWordLength)
      The minimum length to break words down to. Default=1
      See Also:
    • setMaxChanges

      public void setMaxChanges(int maxChanges)
      The maximum numbers of changes (word breaks or combinations) to make on the original term(s). Default=1
      See Also:
    • setMaxEvaluations

      public void setMaxEvaluations(int maxEvaluations)
      The maximum number of word combinations to evaluate. Default=1000. A higher value might improve result quality. A lower value might improve performance.
      See Also: