org.apache.lucene.benchmark.byTask.feeds
Class DemoHTMLParser

java.lang.Object
  extended by org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser
All Implemented Interfaces:
HTMLParser

public class DemoHTMLParser
extends Object
implements HTMLParser

HTML Parser that is based on Lucene's demo HTML parser.


Constructor Summary
DemoHTMLParser()
           
 
Method Summary
 DocData parse(DocData docData, String name, Date date, Reader reader, DateFormat dateFormat)
          Parse the input Reader and return DocData.
 DocData parse(DocData docData, String name, Date date, StringBuffer inputText, DateFormat dateFormat)
          Parse the inputText and return DocData.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DemoHTMLParser

public DemoHTMLParser()
Method Detail

parse

public DocData parse(DocData docData,
                     String name,
                     Date date,
                     Reader reader,
                     DateFormat dateFormat)
              throws IOException,
                     InterruptedException
Description copied from interface: HTMLParser
Parse the input Reader and return DocData. A provided name or date is used for the result, otherwise an attempt is made to set them from the parsed data.

Specified by:
parse in interface HTMLParser
name - name of the result doc data. If null, attempt to set by parsed data.
date - date of the result doc data. If null, attempt to set by parsed data.
reader - of html text to parse.
dateFormat - date formatter to use for extracting the date.
Returns:
Parsed doc data.
Throws:
IOException
InterruptedException

parse

public DocData parse(DocData docData,
                     String name,
                     Date date,
                     StringBuffer inputText,
                     DateFormat dateFormat)
              throws IOException,
                     InterruptedException
Description copied from interface: HTMLParser
Parse the inputText and return DocData.

Specified by:
parse in interface HTMLParser
inputText - the html text to parse.
Throws:
IOException
InterruptedException
See Also:
HTMLParser.parse(DocData, String, Date, Reader, DateFormat)


Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.