Class CheckIndex

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class CheckIndex
    extends Object
    implements Closeable
    Basic tool and API to check the health of an index and write a new segments file that removes reference to problematic segments.

    As this tool checks every byte in the index, on a large index it can take quite a long time to run.

    WARNING: This API is experimental and might change in incompatible ways in the next release.
    Please make a complete backup of your index before using this to exorcise corrupted documents from your index!
    • Constructor Detail

      • CheckIndex

        public CheckIndex​(Directory dir,
                          Lock writeLock)
        Expert: create a directory with the specified lock. This should really not be used except for unit tests!!!! It exists only to support special tests (such as TestIndexWriterExceptions*), that would otherwise be more complicated to debug if they had to close the writer for each check.
    • Method Detail

      • setDoSlowChecks

        public void setDoSlowChecks​(boolean v)
        If true, additional slow checks are performed. This will likely drastically increase time it takes to run CheckIndex!
      • setFailFast

        public void setFailFast​(boolean v)
        If true, just throw the original exception immediately when corruption is detected, rather than continuing to iterate to other segments looking for more corruption.
      • setChecksumsOnly

        public void setChecksumsOnly​(boolean v)
        If true, only validate physical integrity for all files. Note that the returned nested status objects (e.g. storedFieldStatus) will be null.
      • setInfoStream

        public void setInfoStream​(PrintStream out,
                                  boolean verbose)
        Set infoStream where messages should go. If null, no messages are printed. If verbose is true then more details are printed.
      • checkIndex

        public CheckIndex.Status checkIndex()
                                     throws IOException
        Returns a CheckIndex.Status instance detailing the state of the index.

        As this method checks every byte in the index, on a large index it can take quite a long time to run.

        WARNING: make sure you only call this when the index is not opened by any writer.

        Throws:
        IOException
      • checkIndex

        public CheckIndex.Status checkIndex​(List<String> onlySegments)
                                     throws IOException
        Returns a CheckIndex.Status instance detailing the state of the index.
        Parameters:
        onlySegments - list of specific segment names to check

        As this method checks every byte in the specified segments, on a large index it can take quite a long time to run.

        Throws:
        IOException
      • exorciseIndex

        public void exorciseIndex​(CheckIndex.Status result)
                           throws IOException
        Repairs the index using previously returned result from checkIndex(). Note that this does not remove any of the unreferenced files after it's done; you must separately open an IndexWriter, which deletes unreferenced files when it's created.

        WARNING: this writes a new segments file into the index, effectively removing all documents in broken segments from the index. BE CAREFUL.

        Throws:
        IOException
      • assertsOn

        public static boolean assertsOn()
        Check whether asserts are enabled or not.
        Returns:
        true iff asserts are enabled
      • main

        public static void main​(String[] args)
                         throws IOException,
                                InterruptedException
        Command-line interface to check and exorcise corrupt segments from an index.

        Run it like this:

            java -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex pathToIndex [-exorcise] [-verbose] [-segment X] [-segment Y]
            
        • -exorcise: actually write a new segments_N file, removing any problematic segments. *LOSES DATA*
        • -segment X: only check the specified segment(s). This can be specified multiple times, to check more than one segment, eg -segment _2 -segment _a. You can't use this with the -exorcise option.

        WARNING: -exorcise should only be used on an emergency basis as it will cause documents (perhaps many) to be permanently removed from the index. Always make a backup copy of your index before running this! Do not run this tool on an index that is actively being written to. You have been warned!

        Run without -exorcise, this tool will open the index, report version information and report any exceptions it hits and what action it would take if -exorcise were specified. With -exorcise, this tool will remove any segments that have issues and write a new segments_N file. This means all documents contained in the affected segments will be removed.

        This tool exits with exit code 1 if the index cannot be opened or has any corruption, else 0.

        Throws:
        IOException
        InterruptedException
      • parseOptions

        public static CheckIndex.Options parseOptions​(String[] args)
        Parse command line args into fields
        Parameters:
        args - The command line arguments
        Returns:
        An Options struct
        Throws:
        IllegalArgumentException - if any of the CLI args are invalid