Class CapitalizationFilterFactory
- java.lang.Object
-
- org.apache.lucene.analysis.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.TokenFilterFactory
-
- org.apache.lucene.analysis.miscellaneous.CapitalizationFilterFactory
-
public class CapitalizationFilterFactory extends TokenFilterFactory
Factory forCapitalizationFilter
.The factory takes parameters:
- "onlyFirstWord" - should each word be capitalized or all of the words?
- "keep" - a keep word list. Each word that should be kept separated by whitespace.
- "keepIgnoreCase - true or false. If true, the keep list will be considered case-insensitive.
- "forceFirstLetter" - Force the first letter to be capitalized even if it is in the keep list
- "okPrefix" - do not change word capitalization if a word begins with something in this list. for example if "McK" is on the okPrefix list, the word "McKinley" should not be changed to "Mckinley"
- "minWordLength" - how long the word needs to be to get capitalization applied. If the minWordLength is 3, "and" > "And" but "or" stays "or"
- "maxWordCount" - if the token contains more then maxWordCount words, the capitalization is assumed to be correct.
<fieldType name="text_cptlztn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.CapitalizationFilterFactory" onlyFirstWord="true" keep="java solr lucene" keepIgnoreCase="false" okPrefix="McK McD McA"/> </analyzer> </fieldType>
- Since:
- solr 1.3
- SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
- "capitalization"
-
-
Field Summary
Fields Modifier and Type Field Description static String
FORCE_FIRST_LETTER
static String
KEEP
static String
KEEP_IGNORE_CASE
static String
MAX_TOKEN_LENGTH
static String
MAX_WORD_COUNT
static String
MIN_WORD_LENGTH
static String
NAME
SPI namestatic String
OK_PREFIX
static String
ONLY_FIRST_WORD
-
Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
-
Constructor Summary
Constructors Constructor Description CapitalizationFilterFactory()
Default ctor for compatibility with SPICapitalizationFilterFactory(Map<String,String> args)
Creates a new CapitalizationFilterFactory
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description CapitalizationFilter
create(TokenStream input)
-
Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
-
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
NAME
public static final String NAME
SPI name- See Also:
- Constant Field Values
-
KEEP
public static final String KEEP
- See Also:
- Constant Field Values
-
KEEP_IGNORE_CASE
public static final String KEEP_IGNORE_CASE
- See Also:
- Constant Field Values
-
OK_PREFIX
public static final String OK_PREFIX
- See Also:
- Constant Field Values
-
MIN_WORD_LENGTH
public static final String MIN_WORD_LENGTH
- See Also:
- Constant Field Values
-
MAX_WORD_COUNT
public static final String MAX_WORD_COUNT
- See Also:
- Constant Field Values
-
MAX_TOKEN_LENGTH
public static final String MAX_TOKEN_LENGTH
- See Also:
- Constant Field Values
-
ONLY_FIRST_WORD
public static final String ONLY_FIRST_WORD
- See Also:
- Constant Field Values
-
FORCE_FIRST_LETTER
public static final String FORCE_FIRST_LETTER
- See Also:
- Constant Field Values
-
-
Method Detail
-
create
public CapitalizationFilter create(TokenStream input)
- Specified by:
create
in classTokenFilterFactory
-
-