org.apache.lucene.benchmark.byTask.feeds
Class ContentSource

java.lang.Object
  extended by org.apache.lucene.benchmark.byTask.feeds.ContentSource
Direct Known Subclasses:
DirContentSource, EnwikiContentSource, LineDocSource, ReutersContentSource, SingleDocSource, TrecContentSource

public abstract class ContentSource
extends Object

Represents content from a specified source, such as TREC, Reuters etc. A ContentSource is responsible for creating DocData objects for its documents to be consumed by DocMaker. It also keeps track of various statistics, such as how many documents were generated, size in bytes etc.

Supports the following configuration parameters:


Field Summary
protected static int BUFFER_SIZE
           
protected  String encoding
           
protected  boolean forever
           
protected  int logStep
           
protected  boolean verbose
           
 
Constructor Summary
ContentSource()
           
 
Method Summary
protected  void addBytes(long numBytes)
           
protected  void addDoc()
           
abstract  void close()
          Called when reading from this content source is no longer required.
protected  void collectFiles(File dir, ArrayList<File> files)
          A convenience method for collecting all the files of a content source from a given directory.
 long getBytesCount()
          Returns the number of bytes generated since last reset.
 Config getConfig()
           
 int getDocsCount()
          Returns the number of generated documents since last reset.
protected  InputStream getInputStream(File file)
          Returns an InputStream over the requested file.
abstract  DocData getNextDocData(DocData docData)
          Returns the next DocData from the content source.
 long getTotalBytesCount()
          Returns the total number of bytes that were generated by this source.
 int getTotalDocsCount()
          Returns the total number of generated documents.
 void resetInputs()
          Resets the input for this content source, so that the test would behave as if it was just started, input-wise.
 void setConfig(Config config)
          Sets the Config for this content source.
protected  boolean shouldLog()
          Returns true whether it's time to log a message (depending on verbose and the number of documents generated).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BUFFER_SIZE

protected static final int BUFFER_SIZE
See Also:
Constant Field Values

forever

protected boolean forever

logStep

protected int logStep

verbose

protected boolean verbose

encoding

protected String encoding
Constructor Detail

ContentSource

public ContentSource()
Method Detail

addBytes

protected final void addBytes(long numBytes)

addDoc

protected final void addDoc()

collectFiles

protected final void collectFiles(File dir,
                                  ArrayList<File> files)
A convenience method for collecting all the files of a content source from a given directory. The collected File instances are stored in the given files.


getInputStream

protected InputStream getInputStream(File file)
                              throws IOException
Returns an InputStream over the requested file. This method attempts to identify the appropriate InputStream instance to return based on the file name (e.g., if it ends with .bz2 or .bzip, return a 'bzip' InputStream).

Throws:
IOException

shouldLog

protected final boolean shouldLog()
Returns true whether it's time to log a message (depending on verbose and the number of documents generated).


close

public abstract void close()
                    throws IOException
Called when reading from this content source is no longer required.

Throws:
IOException

getBytesCount

public final long getBytesCount()
Returns the number of bytes generated since last reset.


getDocsCount

public final int getDocsCount()
Returns the number of generated documents since last reset.


getConfig

public final Config getConfig()

getNextDocData

public abstract DocData getNextDocData(DocData docData)
                                throws NoMoreDataException,
                                       IOException
Returns the next DocData from the content source.

Throws:
NoMoreDataException
IOException

getTotalBytesCount

public final long getTotalBytesCount()
Returns the total number of bytes that were generated by this source.


getTotalDocsCount

public final int getTotalDocsCount()
Returns the total number of generated documents.


resetInputs

public void resetInputs()
                 throws IOException
Resets the input for this content source, so that the test would behave as if it was just started, input-wise.

NOTE: the default implementation resets the number of bytes and documents generated since the last reset, so it's important to call super.resetInputs in case you override this method.

Throws:
IOException

setConfig

public void setConfig(Config config)
Sets the Config for this content source. If you override this method, you must call super.setConfig.



Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.