Class ExtractReuters
- java.lang.Object
-
- org.apache.lucene.benchmark.utils.ExtractReuters
-
public class ExtractReuters extends Object
Split the Reuters SGML documents into Simple Text files containing: Title, Date, Dateline, Body
-
-
Constructor Summary
Constructors Constructor Description ExtractReuters(Path reutersDir, Path outputDir)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
extract()
protected void
extractFile(Path sgmFile)
Override if you wish to change what is extractedstatic void
main(String[] args)
-
-
-
Constructor Detail
-
ExtractReuters
public ExtractReuters(Path reutersDir, Path outputDir) throws IOException
- Throws:
IOException
-
-
Method Detail
-
extract
public void extract() throws IOException
- Throws:
IOException
-
extractFile
protected void extractFile(Path sgmFile)
Override if you wish to change what is extracted
-
-