org.apache.lucene.analysis.icu
Class ICUNormalizer2Filter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.icu.ICUNormalizer2Filter
- All Implemented Interfaces:
- Closeable
- Direct Known Subclasses:
- ICUFoldingFilter
public class ICUNormalizer2Filter
- extends TokenFilter
Normalize token text with ICU's Normalizer2
With this filter, you can normalize text in the following ways:
- NFKC Normalization, Case Folding, and removing Ignorables (the default)
- Using a standard Normalization mode (NFC, NFD, NFKC, NFKD)
- Based on rules from a custom normalization mapping.
If you use the defaults, this filter is a simple way to standardize Unicode text
in a language-independent way for search:
- The case folding that it does can be seen as a replacement for
LowerCaseFilter: For example, it handles cases such as the Greek sigma, so that
"Μάϊος" and "ΜΆΪΟΣ" will match correctly.
- The normalization will standardizes different forms of the same
character in Unicode. For example, CJK full-width numbers will be standardized
to their ASCII forms.
- Ignorables such as Zero-Width Joiner and Variation Selectors are removed.
These are typically modifier characters that affect display.
- See Also:
Normalizer2
,
FilteredNormalizer2
Constructor Summary |
ICUNormalizer2Filter(TokenStream input)
Create a new Normalizer2Filter that combines NFKC normalization, Case
Folding, and removes Default Ignorables (NFKC_Casefold) |
ICUNormalizer2Filter(TokenStream input,
com.ibm.icu.text.Normalizer2 normalizer)
Create a new Normalizer2Filter with the specified Normalizer2 |
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState |
ICUNormalizer2Filter
public ICUNormalizer2Filter(TokenStream input)
- Create a new Normalizer2Filter that combines NFKC normalization, Case
Folding, and removes Default Ignorables (NFKC_Casefold)
ICUNormalizer2Filter
public ICUNormalizer2Filter(TokenStream input,
com.ibm.icu.text.Normalizer2 normalizer)
- Create a new Normalizer2Filter with the specified Normalizer2
- Parameters:
input
- streamnormalizer
- normalizer to use
incrementToken
public final boolean incrementToken()
throws IOException
- Specified by:
incrementToken
in class TokenStream
- Throws:
IOException
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.