java.lang.Object
- org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser

All Implemented Interfaces:

HTMLParser
```
public class DemoHTMLParser
extends Object
implements HTMLParser
```
Simple HTML Parser extracting title, meta tags, and body text that is based on NekoHTML.

Nested Class Summary

Nested Classes
Modifier and Type Class Description

static class DemoHTMLParser.Parser
The actual parser to read HTML documents

Constructor Summary

Constructors
Constructor Description

DemoHTMLParser()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`DocData`	`parse(DocData docData, String name, Date date, Reader reader, TrecContentSource trecSrc)`	Parse the input Reader and return DocData.
`DocData`	`parse(DocData docData, String name, Date date, InputSource source, TrecContentSource trecSrc)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- DemoHTMLParser
```
public DemoHTMLParser()
```

Method Detail

parse
```
public DocData parse(DocData docData,
                     String name,
                     Date date,
                     Reader reader,
                     TrecContentSource trecSrc)
              throws IOException
```
Description copied from interface: HTMLParser

Parse the input Reader and return DocData. The provided name,title,date are used for the result, unless when they're null, in which case an attempt is made to set them from the parsed data.

Specified by:

parse in interface HTMLParser

Parameters:

docData - result reused

name - name of the result doc data.

date - date of the result doc data. If null, attempt to set by parsed data.

reader - reader of html text to parse.

trecSrc - the TrecContentSource used to parse dates.

Returns:

Parsed doc data.

Throws:

IOException - If there is a low-level I/O error.

parse

public DocData parse(DocData docData,
                     String name,
                     Date date,
                     InputSource source,
                     TrecContentSource trecSrc)
              throws IOException,
                     SAXException

Throws:: IOException; SAXException

Class DemoHTMLParser

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

DemoHTMLParser

Method Detail

parse

parse