Class EnwikiContentSource
java.lang.Object
org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
org.apache.lucene.benchmark.byTask.feeds.ContentSource
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
- All Implemented Interfaces:
Closeable
,AutoCloseable
A
ContentSource
which reads the English Wikipedia dump. You can read the .bz2 file
directly (it will be decompressed on the fly). Config properties:
- keep.image.only.docs=false|true (default true).
- docs.file=<path to the file>
-
Field Summary
Fields inherited from class org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
encoding, forever, logStep, verbose
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Called when reading from this content source is no longer required.getNextDocData
(DocData docData) Returns the nextDocData
from the content source.protected InputStream
Open the input stream.void
Resets the input for this content source, so that the test would behave as if it was just started, input-wise.void
Sets theConfig
for this content source.Methods inherited from class org.apache.lucene.benchmark.byTask.feeds.ContentItemsSource
addBytes, addItem, collectFiles, getBytesCount, getConfig, getItemsCount, getTotalBytesCount, getTotalItemsCount, printStatistics, shouldLog
-
Constructor Details
-
EnwikiContentSource
public EnwikiContentSource()
-
-
Method Details
-
close
Description copied from class:ContentItemsSource
Called when reading from this content source is no longer required.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in classContentItemsSource
- Throws:
IOException
-
getNextDocData
Description copied from class:ContentSource
Returns the nextDocData
from the content source. Implementations must account for multi-threading, as multiple threads can call this method simultaneously.- Specified by:
getNextDocData
in classContentSource
- Throws:
NoMoreDataException
IOException
-
resetInputs
Description copied from class:ContentItemsSource
Resets the input for this content source, so that the test would behave as if it was just started, input-wise.NOTE: the default implementation resets the number of bytes and items generated since the last reset, so it's important to call super.resetInputs in case you override this method.
- Overrides:
resetInputs
in classContentItemsSource
- Throws:
IOException
-
openInputStream
Open the input stream.- Throws:
IOException
-
setConfig
Description copied from class:ContentItemsSource
Sets theConfig
for this content source. If you override this method, you must call super.setConfig.- Overrides:
setConfig
in classContentItemsSource
-