org.apache.lucene.search.spell
Class WordBreakSpellChecker

java.lang.Object
  extended by org.apache.lucene.search.spell.WordBreakSpellChecker

public class WordBreakSpellChecker
extends Object

A spell checker whose sole function is to offer suggestions by combining multiple terms into one word and/or breaking terms into multiple words.


Nested Class Summary
static class WordBreakSpellChecker.BreakSuggestionSortMethod
           Determines the order to list word break suggestions
 
Field Summary
static Term SEPARATOR_TERM
          Term that can be used to prohibit adjacent terms from being combined
 
Constructor Summary
WordBreakSpellChecker()
          Creates a new spellchecker with default configuration values
 
Method Summary
 int getMaxChanges()
          Returns the maximum number of changes to perform on the input
 int getMaxCombineWordLength()
          Returns the maximum length of a combined suggestion
 int getMaxEvaluations()
          Returns the maximum number of word combinations to evaluate.
 int getMinBreakWordLength()
          Returns the minimum size of a broken word
 int getMinSuggestionFrequency()
          Returns the minimum frequency a term must have to be part of a suggestion.
 void setMaxChanges(int maxChanges)
           The maximum numbers of changes (word breaks or combinations) to make on the original term(s).
 void setMaxCombineWordLength(int maxCombineWordLength)
           The maximum length of a suggestion made by combining 1 or more original terms.
 void setMaxEvaluations(int maxEvaluations)
           The maximum number of word combinations to evaluate.
 void setMinBreakWordLength(int minBreakWordLength)
           The minimum length to break words down to.
 void setMinSuggestionFrequency(int minSuggestionFrequency)
           The minimum frequency a term must have to be included as part of a suggestion.
 SuggestWord[][] suggestWordBreaks(Term term, int maxSuggestions, IndexReader ir, SuggestMode suggestMode, WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod)
           Generate suggestions by breaking the passed-in term into multiple words.
 CombineSuggestion[] suggestWordCombinations(Term[] terms, int maxSuggestions, IndexReader ir, SuggestMode suggestMode)
           Generate suggestions by combining one or more of the passed-in terms into single words.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SEPARATOR_TERM

public static final Term SEPARATOR_TERM
Term that can be used to prohibit adjacent terms from being combined

Constructor Detail

WordBreakSpellChecker

public WordBreakSpellChecker()
Creates a new spellchecker with default configuration values

See Also:
setMaxChanges(int), setMaxCombineWordLength(int), setMaxEvaluations(int), setMinBreakWordLength(int), setMinSuggestionFrequency(int)
Method Detail

suggestWordBreaks

public SuggestWord[][] suggestWordBreaks(Term term,
                                         int maxSuggestions,
                                         IndexReader ir,
                                         SuggestMode suggestMode,
                                         WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod)
                                  throws IOException

Generate suggestions by breaking the passed-in term into multiple words. The scores returned are equal to the number of word breaks needed so a lower score is generally preferred over a higher score.

Parameters:
suggestMode - - default = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX
sortMethod - - default = WordBreakSpellChecker.BreakSuggestionSortMethod.NUM_CHANGES_THEN_MAX_FREQUENCY
Returns:
one or more arrays of words formed by breaking up the original term
Throws:
IOException - If there is a low-level I/O error.

suggestWordCombinations

public CombineSuggestion[] suggestWordCombinations(Term[] terms,
                                                   int maxSuggestions,
                                                   IndexReader ir,
                                                   SuggestMode suggestMode)
                                            throws IOException

Generate suggestions by combining one or more of the passed-in terms into single words. The returned CombineSuggestion contains both a SuggestWord and also an array detailing which passed-in terms were involved in creating this combination. The scores returned are equal to the number of word combinations needed, also one less than the length of the array CombineSuggestion.originalTermIndexes. Generally, a suggestion with a lower score is preferred over a higher score.

To prevent two adjacent terms from being combined (for instance, if one is mandatory and the other is prohibited), separate the two terms with SEPARATOR_TERM

When suggestMode equals SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX, each suggestion will include at least one term not in the index.

When suggestMode equals SuggestMode.SUGGEST_MORE_POPULAR, each suggestion will have the same, or better frequency than the most-popular included term.

Returns:
an array of words generated by combining original terms
Throws:
IOException - If there is a low-level I/O error.

getMinSuggestionFrequency

public int getMinSuggestionFrequency()
Returns the minimum frequency a term must have to be part of a suggestion.

See Also:
setMinSuggestionFrequency(int)

getMaxCombineWordLength

public int getMaxCombineWordLength()
Returns the maximum length of a combined suggestion

See Also:
setMaxCombineWordLength(int)

getMinBreakWordLength

public int getMinBreakWordLength()
Returns the minimum size of a broken word

See Also:
setMinBreakWordLength(int)

getMaxChanges

public int getMaxChanges()
Returns the maximum number of changes to perform on the input

See Also:
setMaxChanges(int)

getMaxEvaluations

public int getMaxEvaluations()
Returns the maximum number of word combinations to evaluate.

See Also:
setMaxEvaluations(int)

setMinSuggestionFrequency

public void setMinSuggestionFrequency(int minSuggestionFrequency)

The minimum frequency a term must have to be included as part of a suggestion. Default=1 Not applicable when used with SuggestMode.SUGGEST_MORE_POPULAR

See Also:
getMinSuggestionFrequency()

setMaxCombineWordLength

public void setMaxCombineWordLength(int maxCombineWordLength)

The maximum length of a suggestion made by combining 1 or more original terms. Default=20

See Also:
getMaxCombineWordLength()

setMinBreakWordLength

public void setMinBreakWordLength(int minBreakWordLength)

The minimum length to break words down to. Default=1

See Also:
getMinBreakWordLength()

setMaxChanges

public void setMaxChanges(int maxChanges)

The maximum numbers of changes (word breaks or combinations) to make on the original term(s). Default=1

See Also:
getMaxChanges()

setMaxEvaluations

public void setMaxEvaluations(int maxEvaluations)

The maximum number of word combinations to evaluate. Default=1000. A higher value might improve result quality. A lower value might improve performance.

See Also:
getMaxEvaluations()


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.