Class HyphenationCompoundWordTokenFilterFactory

java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenFilterFactory
org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilterFactory
All Implemented Interfaces:
ResourceLoaderAware

public class HyphenationCompoundWordTokenFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
Factory for HyphenationCompoundWordTokenFilter.

This factory accepts the following parameters:

  • hyphenator (mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/.
  • encoding (optional): encoding of the xml hyphenation file. defaults to UTF-8.
  • dictionary (optional): dictionary of words. defaults to no dictionary.
  • minWordSize (optional): minimal word length that gets decomposed. defaults to 5.
  • minSubwordSize (optional): minimum length of subwords. defaults to 2.
  • maxSubwordSize (optional): maximum length of subwords. defaults to 15.
  • onlyLongestMatch (optional): if true, adds only the longest matching subword to the stream. defaults to false.

 <fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8"
         dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/>
   </analyzer>
 </fieldType>
Since:
3.1.0
See Also:
SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
"hyphenationCompoundWord"