Package org.apache.lucene.analysis.icu
Class ICUFoldingFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.icu.ICUNormalizer2Filter
org.apache.lucene.analysis.icu.ICUFoldingFilter
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Unwrappable<TokenStream>
A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30
Character Foldings.
This filter applies the following foldings from the report to unicode text:
- Accent removal
- Case folding
- Canonical duplicates folding
- Dashes folding
- Diacritic removal (including stroke, hook, descender)
- Greek letterforms folding
- Han Radical folding
- Hebrew Alternates folding
- Jamo folding
- Letterforms folding
- Math symbol folding
- Multigraph Expansions: All
- Native digit folding
- No-break folding
- Overline folding
- Positional forms folding
- Small forms folding
- Space folding
- Spacing Accents folding
- Subscript folding
- Superscript folding
- Suzhou Numeral folding
- Symbol folding
- Underline folding
- Vertical forms folding
- Width folding
Additionally, Default Ignorables are removed, and text is normalized to NFKC. All foldings, case folding, and normalization mappings are applied recursively to ensure a fully folded and normalized result.
A normalizer with additional settings such as a filter that lists characters not to be normalized can be passed in the constructor.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
Field Summary
Modifier and TypeFieldDescriptionstatic final com.ibm.icu.text.Normalizer2
A normalizer for search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
Constructor Summary
ConstructorDescriptionICUFoldingFilter
(TokenStream input) Create a new ICUFoldingFilter on the specified inputICUFoldingFilter
(TokenStream input, com.ibm.icu.text.Normalizer2 normalizer) Create a new ICUFoldingFilter on the specified input with the specified normalizer -
Method Summary
Methods inherited from class org.apache.lucene.analysis.icu.ICUNormalizer2Filter
incrementToken
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, reset, unwrap
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Field Details
-
NORMALIZER
public static final com.ibm.icu.text.Normalizer2 NORMALIZERA normalizer for search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.
-
-
Constructor Details
-
ICUFoldingFilter
Create a new ICUFoldingFilter on the specified input -
ICUFoldingFilter
Create a new ICUFoldingFilter on the specified input with the specified normalizer
-