|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.lucene.ant.HtmlDocument
public class HtmlDocument
The HtmlDocument class creates a Lucene Document from an HTML document.
It does this by using JTidy package. It can take input input
from File or InputStream.
| Constructor Summary | |
|---|---|
HtmlDocument(File file)
Constructs an HtmlDocument from a File. |
|
HtmlDocument(File file,
String tidyConfigFile)
Constructs an HtmlDocument from a
File. |
|
HtmlDocument(InputStream is)
Constructs an HtmlDocument from an InputStream. |
|
| Method Summary | |
|---|---|
static Document |
Document(File file)
Creates a Lucene Document from a File. |
static Document |
Document(File file,
String tidyConfigFile)
Creates a Lucene Document from a
File. |
String |
getBody()
Gets the bodyText attribute of the HtmlDocument object. |
static Document |
getDocument(InputStream is)
Creates a Lucene Document from an InputStream. |
String |
getTitle()
Gets the title attribute of the HtmlDocument
object. |
static void |
main(String[] args)
Runs HtmlDocument on the files specified on
the command line. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public HtmlDocument(File file)
throws IOException
HtmlDocument from a File.
file - the File containing the
HTML to parse
IOException - if an I/O exception occurspublic HtmlDocument(InputStream is)
HtmlDocument from an InputStream.
is - the InputStream
containing the HTML
public HtmlDocument(File file,
String tidyConfigFile)
throws IOException
HtmlDocument from a
File.
file - the File containing the
HTML to parsetidyConfigFile - the String
containing the full path to the Tidy config file
IOException - if an I/O exception occurs| Method Detail |
|---|
public static Document Document(File file,
String tidyConfigFile)
throws IOException
Document from a
File.
file - tidyConfigFile - the full path to the Tidy
config file
IOExceptionpublic static Document getDocument(InputStream is)
Document from an InputStream.
is -
public static Document Document(File file)
throws IOException
Document from a File.
file -
IOException
public static void main(String[] args)
throws Exception
HtmlDocument on the files specified on
the command line.
args - Command line arguments
Exception - Description of Exceptionpublic String getTitle()
HtmlDocument
object.
public String getBody()
HtmlDocument object.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||