Class WriteLineDocTask
java.lang.Object
org.apache.lucene.benchmark.byTask.tasks.PerfTask
org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTask
- All Implemented Interfaces:
Cloneable
- Direct Known Subclasses:
WriteEnwikiLineDocTask
A task which writes documents, one line per document. Each line is in the following format: title
<TAB> date <TAB> body. The output of this task can be consumed by
LineDocSource
and is intended to save the IO overhead
of opening a file per document to be indexed.
The format of the output is set according to the output file extension. Compression is
recommended when the output file is expected to be large. See info on file extensions in StreamUtils.Type
Supports the following parameters:
- line.file.out - the name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
- line.fields - which fields should be written in each line. (optional, default:
DEFAULT_FIELDS
). - sufficient.fields - list of field names, separated by comma, which, if all of them
are missing, the document will be skipped. For example, to require that at least one of
f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required,
i.e. that even empty docs should be emitted, specify ",". (optional, default:
DEFAULT_SUFFICIENT_FIELDS
).
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
int
doLogic()
Perform the task once (ignoring repetitions specification) Return number of work items done by this task.protected String
getLogMessage
(int recsCount) protected PrintWriter
lineFileOut
(Document doc) Selects output line file by written doc.void
Set the params (docSize only)boolean
Sub classes that support parameters must override this method to return true.protected void
writeHeader
(PrintWriter out) Write header to the lines file - indicating how to read the file later.Methods inherited from class org.apache.lucene.benchmark.byTask.tasks.PerfTask
clone, getAlgLineNum, getBackgroundDeltaPriority, getDepth, getName, getParams, getRunData, getRunInBackground, isDisableCounting, runAndMaybeStats, setAlgLineNum, setDepth, setDisableCounting, setName, setRunInBackground, setup, shouldNeverLogAtStart, shouldNotRecordStats, stopNow, tearDown, toString
-
Field Details
-
FIELDS_HEADER_INDICATOR
- See Also:
-
SEP
public static final char SEP- See Also:
-
DEFAULT_FIELDS
Fields to be written by default -
DEFAULT_SUFFICIENT_FIELDS
Default fields which at least one of them is required to not skip the doc.- See Also:
-
fname
-
-
Constructor Details
-
WriteLineDocTask
- Throws:
Exception
-
-
Method Details
-
writeHeader
Write header to the lines file - indicating how to read the file later. -
getLogMessage
- Overrides:
getLogMessage
in classPerfTask
-
doLogic
Description copied from class:PerfTask
Perform the task once (ignoring repetitions specification) Return number of work items done by this task. For indexing that can be number of docs added. For warming that can be number of scanned items, etc. -
lineFileOut
Selects output line file by written doc. Default: original output line file. -
close
-
setParams
Set the params (docSize only) -
supportsParams
public boolean supportsParams()Description copied from class:PerfTask
Sub classes that support parameters must override this method to return true.- Overrides:
supportsParams
in classPerfTask
- Returns:
- true iff this task supports command line params.
-