Package org.apache.lucene.analysis.icu
Class ICUFoldingFilter
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.TokenFilter
-
- org.apache.lucene.analysis.icu.ICUNormalizer2Filter
-
- org.apache.lucene.analysis.icu.ICUFoldingFilter
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class ICUFoldingFilter extends ICUNormalizer2Filter
A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.This filter applies the following foldings from the report to unicode text:
- Accent removal
- Case folding
- Canonical duplicates folding
- Dashes folding
- Diacritic removal (including stroke, hook, descender)
- Greek letterforms folding
- Han Radical folding
- Hebrew Alternates folding
- Jamo folding
- Letterforms folding
- Math symbol folding
- Multigraph Expansions: All
- Native digit folding
- No-break folding
- Overline folding
- Positional forms folding
- Small forms folding
- Space folding
- Spacing Accents folding
- Subscript folding
- Superscript folding
- Suzhou Numeral folding
- Symbol folding
- Underline folding
- Vertical forms folding
- Width folding
Additionally, Default Ignorables are removed, and text is normalized to NFKC. All foldings, case folding, and normalization mappings are applied recursively to ensure a fully folded and normalized result.
A normalizer with additional settings such as a filter that lists characters not to be normalized can be passed in the constructor.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
-
Field Summary
Fields Modifier and Type Field Description static com.ibm.icu.text.Normalizer2
NORMALIZER
A normalizer for search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.-
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
-
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
-
Constructor Summary
Constructors Constructor Description ICUFoldingFilter(TokenStream input)
Create a new ICUFoldingFilter on the specified inputICUFoldingFilter(TokenStream input, com.ibm.icu.text.Normalizer2 normalizer)
Create a new ICUFoldingFilter on the specified input with the specified normalizer
-
Method Summary
-
Methods inherited from class org.apache.lucene.analysis.icu.ICUNormalizer2Filter
incrementToken
-
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, reset
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
-
-
-
Constructor Detail
-
ICUFoldingFilter
public ICUFoldingFilter(TokenStream input)
Create a new ICUFoldingFilter on the specified input
-
ICUFoldingFilter
public ICUFoldingFilter(TokenStream input, com.ibm.icu.text.Normalizer2 normalizer)
Create a new ICUFoldingFilter on the specified input with the specified normalizer
-
-