Class CheckIndex

java.lang.Object
org.apache.lucene.index.CheckIndex
All Implemented Interfaces:
Closeable, AutoCloseable

public final class CheckIndex extends Object implements Closeable
Basic tool and API to check the health of an index and write a new segments file that removes reference to problematic segments.

As this tool checks every byte in the index, on a large index it can take quite a long time to run.

WARNING: This API is experimental and might change in incompatible ways in the next release.
Please make a complete backup of your index before using this to exorcise corrupted documents from your index!
  • Constructor Details

    • CheckIndex

      public CheckIndex(Directory dir) throws IOException
      Create a new CheckIndex on the directory.
      Throws:
      IOException
    • CheckIndex

      public CheckIndex(Directory dir, Lock writeLock)
      Expert: create a directory with the specified lock. This should really not be used except for unit tests!!!! It exists only to support special tests (such as TestIndexWriterExceptions*), that would otherwise be more complicated to debug if they had to close the writer for each check.
  • Method Details

    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException
    • setDoSlowChecks

      public void setDoSlowChecks(boolean v)
      If true, additional slow checks are performed. This will likely drastically increase time it takes to run CheckIndex!
    • doSlowChecks

      public boolean doSlowChecks()
    • setFailFast

      public void setFailFast(boolean v)
      If true, just throw the original exception immediately when corruption is detected, rather than continuing to iterate to other segments looking for more corruption.
    • getFailFast

      public boolean getFailFast()
    • getChecksumsOnly

      public boolean getChecksumsOnly()
    • setChecksumsOnly

      public void setChecksumsOnly(boolean v)
      If true, only validate physical integrity for all files. Note that the returned nested status objects (e.g. storedFieldStatus) will be null.
    • setThreadCount

      public void setThreadCount(int tc)
      Set threadCount used for parallelizing index integrity checking.
    • setInfoStream

      public void setInfoStream(PrintStream out, boolean verbose)
      Set infoStream where messages should go. If null, no messages are printed. If verbose is true then more details are printed.
    • setInfoStream

      public void setInfoStream(PrintStream out)
      Set infoStream where messages should go. See setInfoStream(PrintStream,boolean).
    • checkIndex

      public CheckIndex.Status checkIndex() throws IOException
      Returns a CheckIndex.Status instance detailing the state of the index.

      As this method checks every byte in the index, on a large index it can take quite a long time to run.

      WARNING: make sure you only call this when the index is not opened by any writer.

      Throws:
      IOException
    • checkIndex

      public CheckIndex.Status checkIndex(List<String> onlySegments) throws IOException
      Returns a CheckIndex.Status instance detailing the state of the index.
      Parameters:
      onlySegments - list of specific segment names to check

      As this method checks every byte in the specified segments, on a large index it can take quite a long time to run.

      Throws:
      IOException
    • checkIndex

      public CheckIndex.Status checkIndex(List<String> onlySegments, ExecutorService executorService) throws IOException
      Returns a CheckIndex.Status instance detailing the state of the index.

      This method allows caller to pass in customized ExecutorService to speed up the check.

      WARNING: make sure you only call this when the index is not opened by any writer.

      Throws:
      IOException
    • testSort

      public static CheckIndex.Status.IndexSortStatus testSort(CodecReader reader, Sort sort, PrintStream infoStream, boolean failFast) throws IOException
      Tests index sort order.
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • testLiveDocs

      public static CheckIndex.Status.LiveDocStatus testLiveDocs(CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException
      Test live docs.
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • testFieldInfos

      public static CheckIndex.Status.FieldInfoStatus testFieldInfos(CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException
      Test field infos.
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • testFieldNorms

      public static CheckIndex.Status.FieldNormStatus testFieldNorms(CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException
      Test field norms.
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • testPostings

      public static CheckIndex.Status.TermIndexStatus testPostings(CodecReader reader, PrintStream infoStream) throws IOException
      Test the term index.
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • testPostings

      public static CheckIndex.Status.TermIndexStatus testPostings(CodecReader reader, PrintStream infoStream, boolean verbose, boolean doSlowChecks, boolean failFast) throws IOException
      Test the term index.
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • testPoints

      public static CheckIndex.Status.PointsStatus testPoints(CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException
      Test the points index
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • testVectors

      public static CheckIndex.Status.VectorValuesStatus testVectors(CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException
      Test the vectors index
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • testStoredFields

      public static CheckIndex.Status.StoredFieldStatus testStoredFields(CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException
      Test stored fields.
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • testDocValues

      public static CheckIndex.Status.DocValuesStatus testDocValues(CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException
      Test docvalues.
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • testTermVectors

      public static CheckIndex.Status.TermVectorStatus testTermVectors(CodecReader reader, PrintStream infoStream) throws IOException
      Test term vectors.
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • testTermVectors

      public static CheckIndex.Status.TermVectorStatus testTermVectors(CodecReader reader, PrintStream infoStream, boolean verbose, boolean doSlowChecks, boolean failFast) throws IOException
      Test term vectors.
      Throws:
      IOException
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • exorciseIndex

      public void exorciseIndex(CheckIndex.Status result) throws IOException
      Repairs the index using previously returned result from checkIndex(). Note that this does not remove any of the unreferenced files after it's done; you must separately open an IndexWriter, which deletes unreferenced files when it's created.

      WARNING: this writes a new segments file into the index, effectively removing all documents in broken segments from the index. BE CAREFUL.

      Throws:
      IOException
    • assertsOn

      public static boolean assertsOn()
      Check whether asserts are enabled or not.
      Returns:
      true iff asserts are enabled
    • main

      public static void main(String[] args) throws IOException, InterruptedException
      Command-line interface to check and exorcise corrupt segments from an index.

      Run it like this:

       java -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex pathToIndex [-exorcise] [-verbose] [-segment X] [-segment Y]
       
      • -exorcise: actually write a new segments_N file, removing any problematic segments. *LOSES DATA*
      • -segment X: only check the specified segment(s). This can be specified multiple times, to check more than one segment: -segment _2 * -segment _a. You can't use this with the -exorcise option.

      WARNING: -exorcise should only be used on an emergency basis as it will cause documents (perhaps many) to be permanently removed from the index. Always make a backup copy of your index before running this! Do not run this tool on an index that is actively being written to. You have been warned!

      Run without -exorcise, this tool will open the index, report version information and report any exceptions it hits and what action it would take if -exorcise were specified. With -exorcise, this tool will remove any segments that have issues and write a new segments_N file. This means all documents contained in the affected segments will be removed.

      This tool exits with exit code 1 if the index cannot be opened or has any corruption, else 0.

      Throws:
      IOException
      InterruptedException
    • parseOptions

      public static CheckIndex.Options parseOptions(String[] args)
      Parse command line args into fields
      Parameters:
      args - The command line arguments
      Returns:
      An Options struct
      Throws:
      IllegalArgumentException - if any of the CLI args are invalid
    • doCheck

      public int doCheck(CheckIndex.Options opts) throws IOException, InterruptedException
      Actually perform the index check
      Parameters:
      opts - The options to use for this check
      Returns:
      0 iff the index is clean, 1 otherwise
      Throws:
      IOException
      InterruptedException