public class TrecContentSource extends ContentSource
ContentSource over the TREC collection.
Supports the following configuration parameters (on top of
ContentSource):
TrecDocParser class to use for
parsing the TREC documents content (default=TrecGov2Parser).
HTMLParser class to use for
parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
| Modifier and Type | Field and Description |
|---|---|
static String |
DOC |
static String |
DOCNO |
static String |
NEW_LINE
separator between lines in the byffer
|
static String |
TERMINATING_DOC |
static String |
TERMINATING_DOCNO |
encoding, forever, logStep, verbose| Constructor and Description |
|---|
TrecContentSource() |
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Called when reading from this content source is no longer required.
|
DocData |
getNextDocData(DocData docData)
Returns the next
DocData from the content source. |
Date |
parseDate(String dateStr) |
void |
resetInputs()
Resets the input for this content source, so that the test would behave as
if it was just started, input-wise.
|
void |
setConfig(Config config)
Sets the
Config for this content source. |
addBytes, addItem, collectFiles, getBytesCount, getConfig, getItemsCount, getTotalBytesCount, getTotalItemsCount, printStatistics, shouldLogpublic static final String DOCNO
public static final String TERMINATING_DOCNO
public static final String DOC
public static final String TERMINATING_DOC
public static final String NEW_LINE
public void close()
throws IOException
ContentItemsSourceclose in interface Closeableclose in interface AutoCloseableclose in class ContentItemsSourceIOExceptionpublic DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException
ContentSourceDocData from the content source.
Implementations must account for multi-threading, as multiple threads
can call this method simultaneously.getNextDocData in class ContentSourceNoMoreDataExceptionIOExceptionpublic void resetInputs()
throws IOException
ContentItemsSourceNOTE: the default implementation resets the number of bytes and items generated since the last reset, so it's important to call super.resetInputs in case you override this method.
resetInputs in class ContentItemsSourceIOExceptionpublic void setConfig(Config config)
ContentItemsSourceConfig for this content source. If you override this
method, you must call super.setConfig.setConfig in class ContentItemsSourceCopyright © 2000-2016 Apache Software Foundation. All Rights Reserved.