org.apache.lucene.analysis.miscellaneous
Class CapitalizationFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.miscellaneous.CapitalizationFilter
- All Implemented Interfaces:
- Closeable
public final class CapitalizationFilter
- extends TokenFilter
A filter to apply normal capitalization rules to Tokens. It will make the first letter
capital and the rest lower case.
This filter is particularly useful to build nice looking facet parameters. This filter
is not appropriate if you intend to use a prefix query.
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState |
DEFAULT_MAX_WORD_COUNT
public static final int DEFAULT_MAX_WORD_COUNT
- See Also:
- Constant Field Values
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
- See Also:
- Constant Field Values
CapitalizationFilter
public CapitalizationFilter(TokenStream in)
- Creates a CapitalizationFilter with the default parameters.
Calls CapitalizationFilter(in, true, null, true, null, 0, DEFAULT_MAX_WORD_COUNT, DEFAULT_MAX_TOKEN_LENGTH)
CapitalizationFilter
public CapitalizationFilter(TokenStream in,
boolean onlyFirstWord,
CharArraySet keep,
boolean forceFirstLetter,
Collection<char[]> okPrefix,
int minWordLength,
int maxWordCount,
int maxTokenLength)
- Creates a CapitalizationFilter with the specified parameters.
- Parameters:
in
- input tokenstreamonlyFirstWord
- should each word be capitalized or all of the words?keep
- a keep word list. Each word that should be kept separated by whitespace.forceFirstLetter
- Force the first letter to be capitalized even if it is in the keep list.okPrefix
- do not change word capitalization if a word begins with something in this list.minWordLength
- how long the word needs to be to get capitalization applied. If the
minWordLength is 3, "and" > "And" but "or" stays "or".maxWordCount
- if the token contains more then maxWordCount words, the capitalization is
assumed to be correct.maxTokenLength
- ???
incrementToken
public boolean incrementToken()
throws IOException
- Specified by:
incrementToken
in class TokenStream
- Throws:
IOException
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.