Class TrecContentSource
- java.lang.Object
-
- org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
-
- org.apache.lucene.benchmark.byTask.feeds.ContentSource
-
- org.apache.lucene.benchmark.byTask.feeds.TrecContentSource
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public class TrecContentSource extends ContentSource
Implements aContentSource
over the TREC collection.Supports the following configuration parameters (on top of
ContentSource
):- work.dir - specifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dir - specifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parser - specifies the
TrecDocParser
class to use for parsing the TREC documents content (default=TrecGov2Parser). - html.parser - specifies the
HTMLParser
class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser). - content.source.encoding - if not specified, ISO-8859-1 is used.
- content.source.excludeIteration - if true, do not append iteration number to docname
-
-
Field Summary
Fields Modifier and Type Field Description static String
DOC
static String
DOCNO
static String
NEW_LINE
separator between lines in the byfferstatic String
TERMINATING_DOC
static String
TERMINATING_DOCNO
-
Fields inherited from class org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
encoding, forever, logStep, verbose
-
-
Constructor Summary
Constructors Constructor Description TrecContentSource()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Called when reading from this content source is no longer required.DocData
getNextDocData(DocData docData)
Returns the nextDocData
from the content source.Date
parseDate(String dateStr)
void
resetInputs()
Resets the input for this content source, so that the test would behave as if it was just started, input-wise.void
setConfig(Config config)
Sets theConfig
for this content source.-
Methods inherited from class org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
addBytes, addItem, collectFiles, getBytesCount, getConfig, getItemsCount, getTotalBytesCount, getTotalItemsCount, printStatistics, shouldLog
-
-
-
-
Field Detail
-
DOCNO
public static final String DOCNO
- See Also:
- Constant Field Values
-
TERMINATING_DOCNO
public static final String TERMINATING_DOCNO
- See Also:
- Constant Field Values
-
DOC
public static final String DOC
- See Also:
- Constant Field Values
-
TERMINATING_DOC
public static final String TERMINATING_DOC
- See Also:
- Constant Field Values
-
NEW_LINE
public static final String NEW_LINE
separator between lines in the byffer
-
-
Method Detail
-
close
public void close() throws IOException
Description copied from class:ContentItemsSource
Called when reading from this content source is no longer required.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in classContentItemsSource
- Throws:
IOException
-
getNextDocData
public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException
Description copied from class:ContentSource
Returns the nextDocData
from the content source. Implementations must account for multi-threading, as multiple threads can call this method simultaneously.- Specified by:
getNextDocData
in classContentSource
- Throws:
NoMoreDataException
IOException
-
resetInputs
public void resetInputs() throws IOException
Description copied from class:ContentItemsSource
Resets the input for this content source, so that the test would behave as if it was just started, input-wise.NOTE: the default implementation resets the number of bytes and items generated since the last reset, so it's important to call super.resetInputs in case you override this method.
- Overrides:
resetInputs
in classContentItemsSource
- Throws:
IOException
-
setConfig
public void setConfig(Config config)
Description copied from class:ContentItemsSource
Sets theConfig
for this content source. If you override this method, you must call super.setConfig.- Overrides:
setConfig
in classContentItemsSource
-
-