Class GermanNormalizationFilter

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class GermanNormalizationFilter
    extends TokenFilter
    Normalizes German characters according to the heuristics of the German2 snowball algorithm. It allows for the fact that ä, ö and ü are sometimes written as ae, oe and ue.
    • 'ß' is replaced by 'ss'
    • 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively.
    • 'ae' and 'oe' are replaced by 'a', and 'o', respectively.
    • 'ue' is replaced by 'u', when not following a vowel or q.
    This is useful if you want this normalization without using the German2 stemmer, or perhaps no stemming at all.