org.apache.lucene.benchmark.byTask.feeds
Class DemoHTMLParser

java.lang.Object
  extended by org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser
All Implemented Interfaces:
HTMLParser

public class DemoHTMLParser
extends Object
implements HTMLParser

Simple HTML Parser extracting title, meta tags, and body text that is based on NekoHTML.


Nested Class Summary
static class DemoHTMLParser.Parser
          The actual parser to read HTML documents
 
Constructor Summary
DemoHTMLParser()
           
 
Method Summary
 DocData parse(DocData docData, String name, Date date, InputSource source, TrecContentSource trecSrc)
           
 DocData parse(DocData docData, String name, Date date, Reader reader, TrecContentSource trecSrc)
          Parse the input Reader and return DocData.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DemoHTMLParser

public DemoHTMLParser()
Method Detail

parse

public DocData parse(DocData docData,
                     String name,
                     Date date,
                     Reader reader,
                     TrecContentSource trecSrc)
              throws IOException
Description copied from interface: HTMLParser
Parse the input Reader and return DocData. The provided name,title,date are used for the result, unless when they're null, in which case an attempt is made to set them from the parsed data.

Specified by:
parse in interface HTMLParser
Parameters:
docData - result reused
name - name of the result doc data.
date - date of the result doc data. If null, attempt to set by parsed data.
reader - reader of html text to parse.
trecSrc - the TrecContentSource used to parse dates.
Returns:
Parsed doc data.
Throws:
IOException - If there is a low-level I/O error.

parse

public DocData parse(DocData docData,
                     String name,
                     Date date,
                     InputSource source,
                     TrecContentSource trecSrc)
              throws IOException,
                     SAXException
Throws:
IOException
SAXException


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.