Class ExtractReuters
- java.lang.Object
-
- org.apache.lucene.benchmark.utils.ExtractReuters
-
public class ExtractReuters extends Object
Split the Reuters SGML documents into Simple Text files containing: Title, Date, Dateline, Body
-
-
Constructor Summary
Constructors Constructor Description ExtractReuters(Path reutersDir, Path outputDir)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidextract()protected voidextractFile(Path sgmFile)Override if you wish to change what is extractedstatic voidmain(String[] args)
-
-
-
Constructor Detail
-
ExtractReuters
public ExtractReuters(Path reutersDir, Path outputDir) throws IOException
- Throws:
IOException
-
-
Method Detail
-
extract
public void extract() throws IOException- Throws:
IOException
-
extractFile
protected void extractFile(Path sgmFile)
Override if you wish to change what is extracted
-
-