Class EnwikiContentSource

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public class EnwikiContentSource
    extends ContentSource
    A ContentSource which reads the English Wikipedia dump. You can read the .bz2 file directly (it will be decompressed on the fly). Config properties:
    • keep.image.only.docs=false|true (default true).
    • docs.file=<path to the file>