Class WriteLineDocTask

java.lang.Object
org.apache.lucene.benchmark.byTask.tasks.PerfTask
org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTask
All Implemented Interfaces:
Cloneable
Direct Known Subclasses:
WriteEnwikiLineDocTask

public class WriteLineDocTask extends PerfTask
A task which writes documents, one line per document. Each line is in the following format: title <TAB> date <TAB> body. The output of this task can be consumed by LineDocSource and is intended to save the IO overhead of opening a file per document to be indexed.

The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in StreamUtils.Type

Supports the following parameters:

  • line.file.out - the name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
  • line.fields - which fields should be written in each line. (optional, default: DEFAULT_FIELDS).
  • sufficient.fields - list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify ",". (optional, default: DEFAULT_SUFFICIENT_FIELDS).
NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).
  • Field Details

    • FIELDS_HEADER_INDICATOR

      public static final String FIELDS_HEADER_INDICATOR
      See Also:
    • SEP

      public static final char SEP
      See Also:
    • DEFAULT_FIELDS

      public static final String[] DEFAULT_FIELDS
      Fields to be written by default
    • DEFAULT_SUFFICIENT_FIELDS

      public static final String DEFAULT_SUFFICIENT_FIELDS
      Default fields which at least one of them is required to not skip the doc.
      See Also:
    • fname

      protected final String fname
  • Constructor Details

  • Method Details

    • writeHeader

      protected void writeHeader(PrintWriter out)
      Write header to the lines file - indicating how to read the file later.
    • getLogMessage

      protected String getLogMessage(int recsCount)
      Overrides:
      getLogMessage in class PerfTask
    • doLogic

      public int doLogic() throws Exception
      Description copied from class: PerfTask
      Perform the task once (ignoring repetitions specification) Return number of work items done by this task. For indexing that can be number of docs added. For warming that can be number of scanned items, etc.
      Specified by:
      doLogic in class PerfTask
      Returns:
      number of work items done by this task.
      Throws:
      Exception
    • lineFileOut

      protected PrintWriter lineFileOut(Document doc)
      Selects output line file by written doc. Default: original output line file.
    • close

      public void close() throws Exception
      Overrides:
      close in class PerfTask
      Throws:
      Exception
    • setParams

      public void setParams(String params)
      Set the params (docSize only)
      Overrides:
      setParams in class PerfTask
      Parameters:
      params - docSize, or 0 for no limit.
    • supportsParams

      public boolean supportsParams()
      Description copied from class: PerfTask
      Sub classes that support parameters must override this method to return true.
      Overrides:
      supportsParams in class PerfTask
      Returns:
      true iff this task supports command line params.