Class FingerprintFilter

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Unwrappable<TokenStream>

    public class FingerprintFilter
    extends TokenFilter
    Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens. This can be useful for clustering/linking use cases.
    • Field Detail

      • DEFAULT_MAX_OUTPUT_TOKEN_SIZE

        public static final int DEFAULT_MAX_OUTPUT_TOKEN_SIZE
        See Also:
        Constant Field Values
    • Constructor Detail

      • FingerprintFilter

        public FingerprintFilter​(TokenStream input)
        Create a new FingerprintFilter with default settings
      • FingerprintFilter

        public FingerprintFilter​(TokenStream input,
                                 int maxOutputTokenSize,
                                 char separator)
        Create a new FingerprintFilter with control over all settings
        Parameters:
        input - the source of tokens to be summarized into a single token
        maxOutputTokenSize - the maximum length of the summarized output token. If exceeded, no output token is emitted
        separator - the character used to separate tokens combined into the single output token