Interface HTMLParser

All Known Implementing Classes:
DemoHTMLParser

public interface HTMLParser
HTML Parsing Interface for test purposes
  • Method Details

    • parse

      DocData parse(DocData docData, String name, Date date, Reader reader, TrecContentSource trecSrc) throws IOException
      Parse the input Reader and return DocData. The provided name,title,date are used for the result, unless when they're null, in which case an attempt is made to set them from the parsed data.
      Parameters:
      docData - result reused
      name - name of the result doc data.
      date - date of the result doc data. If null, attempt to set by parsed data.
      reader - reader of html text to parse.
      trecSrc - the TrecContentSource used to parse dates.
      Returns:
      Parsed doc data.
      Throws:
      IOException - If there is a low-level I/O error.