org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilterFactory

All Implemented Interfaces:: ResourceLoaderAware

public class HyphenationCompoundWordTokenFilterFactory extends TokenFilterFactory implements ResourceLoaderAware

Factory for HyphenationCompoundWordTokenFilter.

This factory accepts the following parameters:

hyphenator (mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/.
encoding (optional): encoding of the xml hyphenation file. defaults to UTF-8.
dictionary (optional): dictionary of words. defaults to no dictionary.
minWordSize (optional): minimal word length that gets decomposed. defaults to 5.
minSubwordSize (optional): minimum length of subwords. defaults to 2.
maxSubwordSize (optional): maximum length of subwords. defaults to 15.
onlyLongestMatch (optional): if true, adds only the longest matching subword to the stream. defaults to false.

 <fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8"
         dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/>
   </analyzer>
 </fieldType>

Since:

3.1.0

See Also:

HyphenationCompoundWordTokenFilter

SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).

"hyphenationCompoundWord"

Field Summary

Fields

Modifier and Type

Field

Description

static final String

NAME

SPI name

Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor Summary

Constructors

Constructor

Description

HyphenationCompoundWordTokenFilterFactory()

Default ctor for compatibility with SPI

HyphenationCompoundWordTokenFilterFactory(Map<String,String> args)

Creates a new HyphenationCompoundWordTokenFilterFactory
Method Summary

Modifier and Type

Method

Description

TokenFilter

create(TokenStream input)

void

inform(ResourceLoader loader)

Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters

Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- NAME
  
  public static final String NAME
  
  SPI name
  See Also:
  
  Constant Field Values
Constructor Details
- HyphenationCompoundWordTokenFilterFactory
  
  public HyphenationCompoundWordTokenFilterFactory(Map<String,String> args)
  
  Creates a new HyphenationCompoundWordTokenFilterFactory
- HyphenationCompoundWordTokenFilterFactory
  
  public HyphenationCompoundWordTokenFilterFactory()
  
  Default ctor for compatibility with SPI
Method Details
- inform
  
  public void inform(ResourceLoader loader) throws IOException
  
  Specified by:
  
  inform in interface ResourceLoaderAware
  
  Throws:
  
  IOException
- create
  
  public TokenFilter create(TokenStream input)
  
  Specified by:
  
  create in class TokenFilterFactory

Class HyphenationCompoundWordTokenFilterFactory

Field Summary

Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory

Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

Methods inherited from class java.lang.Object

Field Details

NAME

Constructor Details

HyphenationCompoundWordTokenFilterFactory

HyphenationCompoundWordTokenFilterFactory

Method Details

inform

create