org.apache.lucene.analysis.de
Class GermanNormalizationFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.de.GermanNormalizationFilter
- All Implemented Interfaces:
- Closeable
public final class GermanNormalizationFilter
- extends TokenFilter
Normalizes German characters according to the heuristics
of the
German2 snowball algorithm.
It allows for the fact that ä, ö and ü are sometimes written as ae, oe and ue.
- 'ß' is replaced by 'ss'
- 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively.
- 'ae' and 'oe' are replaced by 'a', and 'o', respectively.
- 'ue' is replaced by 'u', when not following a vowel or q.
This is useful if you want this normalization without using
the German2 stemmer, or perhaps no stemming at all.
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString |
GermanNormalizationFilter
public GermanNormalizationFilter(TokenStream input)
incrementToken
public boolean incrementToken()
throws IOException
- Specified by:
incrementToken
in class TokenStream
- Throws:
IOException
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.