Class WriteLineDocTask

  • All Implemented Interfaces:
    Cloneable
    Direct Known Subclasses:
    WriteEnwikiLineDocTask

    public class WriteLineDocTask
    extends PerfTask
    A task which writes documents, one line per document. Each line is in the following format: title <TAB> date <TAB> body. The output of this task can be consumed by LineDocSource and is intended to save the IO overhead of opening a file per document to be indexed.

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in StreamUtils.Type

    Supports the following parameters:

    • line.file.out - the name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fields - which fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields - list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify ",". (optional, default: DEFAULT_SUFFICIENT_FIELDS).
    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).
    • Field Detail

      • DEFAULT_FIELDS

        public static final String[] DEFAULT_FIELDS
        Fields to be written by default
      • DEFAULT_SUFFICIENT_FIELDS

        public static final String DEFAULT_SUFFICIENT_FIELDS
        Default fields which at least one of them is required to not skip the doc.
        See Also:
        Constant Field Values
      • fname

        protected final String fname
    • Method Detail

      • writeHeader

        protected void writeHeader​(PrintWriter out)
        Write header to the lines file - indicating how to read the file later.
      • doLogic

        public int doLogic()
                    throws Exception
        Description copied from class: PerfTask
        Perform the task once (ignoring repetitions specification) Return number of work items done by this task. For indexing that can be number of docs added. For warming that can be number of scanned items, etc.
        Specified by:
        doLogic in class PerfTask
        Returns:
        number of work items done by this task.
        Throws:
        Exception
      • lineFileOut

        protected PrintWriter lineFileOut​(Document doc)
        Selects output line file by written doc. Default: original output line file.
      • setParams

        public void setParams​(String params)
        Set the params (docSize only)
        Overrides:
        setParams in class PerfTask
        Parameters:
        params - docSize, or 0 for no limit.
      • supportsParams

        public boolean supportsParams()
        Description copied from class: PerfTask
        Sub classes that support parameters must override this method to return true.
        Overrides:
        supportsParams in class PerfTask
        Returns:
        true iff this task supports command line params.