DemoHTMLParser (Lucene 4.5.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.benchmark.byTask.feeds
Class DemoHTMLParser

java.lang.Object
  org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser

All Implemented Interfaces:: HTMLParser

public class DemoHTMLParser
extends Object
implements HTMLParser
extends Object
implements HTMLParser

Simple HTML Parser extracting title, meta tags, and body text that is based on NekoHTML.

Nested Class Summary
`static class`	`DemoHTMLParser.Parser` The actual parser to read HTML documents

Constructor Summary
`DemoHTMLParser()`

Method Summary
`DocData`	`parse(DocData docData, String name, Date date, InputSource source, TrecContentSource trecSrc)`
`DocData`	`parse(DocData docData, String name, Date date, Reader reader, TrecContentSource trecSrc)` Parse the input Reader and return DocData.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

DemoHTMLParser

public DemoHTMLParser()

Method Detail

parse

public DocData parse(DocData docData,
                     String name,
                     Date date,
                     Reader reader,
                     TrecContentSource trecSrc)
              throws IOException

Description copied from interface: HTMLParser

Parse the input Reader and return DocData. The provided name,title,date are used for the result, unless when they're null, in which case an attempt is made to set them from the parsed data.

Specified by:: parse in interface HTMLParser

Parameters:: docData - result reused; name - name of the result doc data.; date - date of the result doc data. If null, attempt to set by parsed data.; reader - reader of html text to parse.; trecSrc - the TrecContentSource used to parse dates.
Returns:: Parsed doc data.
Throws:: IOException - If there is a low-level I/O error.

parse

public DocData parse(DocData docData,
                     String name,
                     Date date,
                     InputSource source,
                     TrecContentSource trecSrc)
              throws IOException,
                     SAXException

Throws:: IOException; SAXException

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.benchmark.byTask.feeds Class DemoHTMLParser

DemoHTMLParser

parse

parse

org.apache.lucene.benchmark.byTask.feeds
Class DemoHTMLParser