Class EnwikiContentSource

All Implemented Interfaces:
Closeable, AutoCloseable

public class EnwikiContentSource extends ContentSource
A ContentSource which reads the English Wikipedia dump. You can read the .bz2 file directly (it will be decompressed on the fly). Config properties:
  • keep.image.only.docs=false|true (default true).
  • docs.file=<path to the file>