Class OpenNLPLemmatizerFilter

All Implemented Interfaces:
Closeable, AutoCloseable, Unwrappable<TokenStream>

public class OpenNLPLemmatizerFilter extends TokenFilter
Runs OpenNLP dictionary-based and/or MaxEnt lemmatizers.

Both a dictionary-based lemmatizer and a MaxEnt lemmatizer are supported, via the "dictionary" and "lemmatizerModel" params, respectively. If both are configured, the dictionary-based lemmatizer is tried first, and then the MaxEnt lemmatizer is consulted for out-of-vocabulary tokens.

The dictionary file must be encoded as UTF-8, with one entry per line, in the form word[tab]lemma[tab]part-of-speech