org.apache.lucene.benchmark.byTask.feeds
Interface HTMLParser

All Known Implementing Classes:
DemoHTMLParser

public interface HTMLParser

HTML Parsing Interface for test purposes


Method Summary
 DocData parse(DocData docData, String name, Date date, Reader reader, DateFormat dateFormat)
          Parse the input Reader and return DocData.
 DocData parse(DocData docData, String name, Date date, StringBuffer inputText, DateFormat dateFormat)
          Parse the inputText and return DocData.
 

Method Detail

parse

DocData parse(DocData docData,
              String name,
              Date date,
              Reader reader,
              DateFormat dateFormat)
              throws IOException,
                     InterruptedException
Parse the input Reader and return DocData. A provided name or date is used for the result, otherwise an attempt is made to set them from the parsed data.

Parameters:
dateFormat - date formatter to use for extracting the date.
name - name of the result doc data. If null, attempt to set by parsed data.
date - date of the result doc data. If null, attempt to set by parsed data.
reader - of html text to parse.
Returns:
Parsed doc data.
Throws:
IOException
InterruptedException

parse

DocData parse(DocData docData,
              String name,
              Date date,
              StringBuffer inputText,
              DateFormat dateFormat)
              throws IOException,
                     InterruptedException
Parse the inputText and return DocData.

Parameters:
inputText - the html text to parse.
Throws:
IOException
InterruptedException
See Also:
parse(DocData, String, Date, Reader, DateFormat)


Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.