public class TrecContentSource extends ContentSource
ContentSource
over the TREC collection.
Supports the following configuration parameters (on top of
ContentSource
):
TrecDocParser
class to use for
parsing the TREC documents content (default=TrecGov2Parser).
HTMLParser
class to use for
parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
Modifier and Type | Field and Description |
---|---|
static String |
DOC |
static String |
DOCNO |
static String |
NEW_LINE
separator between lines in the byffer
|
static String |
TERMINATING_DOC |
static String |
TERMINATING_DOCNO |
encoding, forever, logStep, verbose
Constructor and Description |
---|
TrecContentSource() |
Modifier and Type | Method and Description |
---|---|
void |
close()
Called when reading from this content source is no longer required.
|
DocData |
getNextDocData(DocData docData)
Returns the next
DocData from the content source. |
Date |
parseDate(String dateStr) |
void |
resetInputs()
Resets the input for this content source, so that the test would behave as
if it was just started, input-wise.
|
void |
setConfig(Config config)
Sets the
Config for this content source. |
addBytes, addItem, collectFiles, getBytesCount, getConfig, getItemsCount, getTotalBytesCount, getTotalItemsCount, printStatistics, shouldLog
public static final String DOCNO
public static final String TERMINATING_DOCNO
public static final String DOC
public static final String TERMINATING_DOC
public static final String NEW_LINE
public void close() throws IOException
ContentItemsSource
close
in interface Closeable
close
in class ContentItemsSource
IOException
public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException
ContentSource
DocData
from the content source.
Implementations must account for multi-threading, as multiple threads
can call this method simultaneously.getNextDocData
in class ContentSource
NoMoreDataException
IOException
public void resetInputs() throws IOException
ContentItemsSource
NOTE: the default implementation resets the number of bytes and items generated since the last reset, so it's important to call super.resetInputs in case you override this method.
resetInputs
in class ContentItemsSource
IOException
public void setConfig(Config config)
ContentItemsSource
Config
for this content source. If you override this
method, you must call super.setConfig.setConfig
in class ContentItemsSource