Class HyphenationCompoundWordTokenFilterFactory
- java.lang.Object
-
- org.apache.lucene.analysis.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.TokenFilterFactory
-
- org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilterFactory
-
- All Implemented Interfaces:
ResourceLoaderAware
public class HyphenationCompoundWordTokenFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
Factory forHyphenationCompoundWordTokenFilter
.This factory accepts the following parameters:
hyphenator
(mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/.encoding
(optional): encoding of the xml hyphenation file. defaults to UTF-8.dictionary
(optional): dictionary of words. defaults to no dictionary.minWordSize
(optional): minimal word length that gets decomposed. defaults to 5.minSubwordSize
(optional): minimum length of subwords. defaults to 2.maxSubwordSize
(optional): maximum length of subwords. defaults to 15.onlyLongestMatch
(optional): if true, adds only the longest matching subword to the stream. defaults to false.
<fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/> </analyzer> </fieldType>
- Since:
- 3.1.0
- See Also:
HyphenationCompoundWordTokenFilter
- SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
- "hyphenationCompoundWord"
-
-
Field Summary
Fields Modifier and Type Field Description static String
NAME
SPI name-
Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
-
Constructor Summary
Constructors Constructor Description HyphenationCompoundWordTokenFilterFactory()
Default ctor for compatibility with SPIHyphenationCompoundWordTokenFilterFactory(Map<String,String> args)
Creates a new HyphenationCompoundWordTokenFilterFactory
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description TokenFilter
create(TokenStream input)
void
inform(ResourceLoader loader)
-
Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
-
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
NAME
public static final String NAME
SPI name- See Also:
- Constant Field Values
-
-
Constructor Detail
-
HyphenationCompoundWordTokenFilterFactory
public HyphenationCompoundWordTokenFilterFactory(Map<String,String> args)
Creates a new HyphenationCompoundWordTokenFilterFactory
-
HyphenationCompoundWordTokenFilterFactory
public HyphenationCompoundWordTokenFilterFactory()
Default ctor for compatibility with SPI
-
-
Method Detail
-
inform
public void inform(ResourceLoader loader) throws IOException
- Specified by:
inform
in interfaceResourceLoaderAware
- Throws:
IOException
-
create
public TokenFilter create(TokenStream input)
- Specified by:
create
in classTokenFilterFactory
-
-