Class FingerprintFilter

All Implemented Interfaces:
Closeable, AutoCloseable, Unwrappable<TokenStream>

public class FingerprintFilter extends TokenFilter
Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens. This can be useful for clustering/linking use cases.
  • Field Details

    • DEFAULT_MAX_OUTPUT_TOKEN_SIZE

      public static final int DEFAULT_MAX_OUTPUT_TOKEN_SIZE
      See Also:
    • DEFAULT_SEPARATOR

      public static final char DEFAULT_SEPARATOR
      See Also:
  • Constructor Details

    • FingerprintFilter

      public FingerprintFilter(TokenStream input)
      Create a new FingerprintFilter with default settings
    • FingerprintFilter

      public FingerprintFilter(TokenStream input, int maxOutputTokenSize, char separator)
      Create a new FingerprintFilter with control over all settings
      Parameters:
      input - the source of tokens to be summarized into a single token
      maxOutputTokenSize - the maximum length of the summarized output token. If exceeded, no output token is emitted
      separator - the character used to separate tokens combined into the single output token
  • Method Details