Package org.apache.lucene.tests.util
Class TestUtil
- java.lang.Object
-
- org.apache.lucene.tests.util.TestUtil
-
public final class TestUtil extends Object
General utility methods for Lucene unit tests.
-
-
Field Summary
Fields Modifier and Type Field Description static Comparator<CharSequence>
STRING_CODEPOINT_COMPARATOR
A comparator that compares UTF-16 strings / char sequences according to Unicode code point order.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
addIndexesSlowly(IndexWriter writer, DirectoryReader... readers)
static Codec
alwaysDocValuesFormat(DocValuesFormat format)
Return a Codec that can read any of the default codecs and formats, but always writes in the specified format.static Codec
alwaysPostingsFormat(PostingsFormat format)
Return a Codec that can read any of the default codecs and formats, but always writes in the specified format.static boolean
anyFilesExceptWriteLock(Directory dir)
static <T> void
assertAttributeReflection(AttributeImpl att, Map<String,T> reflectedValues)
Checks some basic behaviour of an AttributeImplstatic void
assertConsistent(TopDocs expected, TopDocs actual)
Assert that the givenTopDocs
have the same top docs and consistent hit counts.static String
bytesRefToString(BytesRef br)
For debugging: tries to include br.utf8ToString(), but if that fails (because it's not valid utf8, which is fine!), just use ordinary toString.static CharSequence
bytesToCharSequence(BytesRef ref, Random random)
static CheckIndex.Status
checkIndex(Directory dir)
This runs the CheckIndex tool on the index in.static CheckIndex.Status
checkIndex(Directory dir, boolean doSlowChecks)
static CheckIndex.Status
checkIndex(Directory dir, boolean doSlowChecks, boolean failFast, boolean concurrent, ByteArrayOutputStream output)
If failFast is true, then throw the first exception when index corruption is hit, instead of moving on to other fields/segments to look for any other corruption.static <T> void
checkIterator(Iterator<T> iterator)
Checks that the provided iterator is well-formed.static <T> void
checkIterator(Iterator<T> iterator, long expectedSize, boolean allowNull)
Checks that the provided iterator is well-formed.static void
checkReader(IndexReader reader)
This runs the CheckIndex tool on the Reader.static void
checkReader(LeafReader reader, boolean doSlowChecks)
static <T> void
checkReadOnly(Collection<T> coll)
Checks that the provided collection is read-only.static Document
cloneDocument(Document doc1)
static boolean
disableVirusChecker(Directory in)
Returns true if VirusCheckingFS is in use and was in fact already enabledstatic PostingsEnum
docs(Random random, IndexReader r, String field, BytesRef term, PostingsEnum reuse, int flags)
static PostingsEnum
docs(Random random, TermsEnum termsEnum, PostingsEnum reuse, int flags)
static void
enableVirusChecker(Directory in)
static boolean
fieldSupportsHugeBinaryDocValues(String field)
static Codec
getDefaultCodec()
Returns the actual default codec (e.g.static DocValuesFormat
getDefaultDocValuesFormat()
Returns the actual default docvalues format (e.g.static KnnVectorsFormat
getDefaultKnnVectorsFormat()
Returns the actual default vector format (e.g.static PostingsFormat
getDefaultPostingsFormat()
Returns the actual default postings format (e.g.static PostingsFormat
getDefaultPostingsFormat(int minItemsPerBlock, int maxItemsPerBlock)
Returns the actual default postings format (e.g.static String
getDocValuesFormat(String field)
static String
getDocValuesFormat(Codec codec, String field)
static String
getPostingsFormat(String field)
static String
getPostingsFormat(Codec codec, String field)
static PostingsFormat
getPostingsFormatWithOrds(Random r)
Returns a random postings format that supports term ordinalsstatic boolean
hasVirusChecker(Path path)
static boolean
hasVirusChecker(Directory dir)
static boolean
hasWindowsFS(Path path)
static boolean
hasWindowsFS(Directory dir)
static BigInteger
nextBigInteger(Random random, int maxBytes)
Returns a randomish big integer with1 .. maxBytes
storage.static int
nextInt(Random r, int start, int end)
start and end are BOTH inclusivestatic long
nextLong(Random r, long start, long end)
start and end are BOTH inclusivestatic Directory
ramCopyOf(Directory dir)
Returns a copy of the source directory, with file contents stored in RAM.static String
randomAnalysisString(Random random, int maxLength, boolean simple)
static BytesRef
randomBinaryTerm(Random r)
Returns a random binary term.static BytesRef
randomBinaryTerm(Random r, int length)
Returns a random binary with a given lengthstatic String
randomFixedByteLengthUnicodeString(Random r, int length)
Returns random string, with a given UTF-8 byte lengthstatic void
randomFixedLengthUnicodeString(Random random, char[] chars, int offset, int length)
Fills provided char[] with valid random unicode code unit sequence.static String
randomHtmlishString(Random random, int numElements)
static String
randomlyRecaseCodePoints(Random random, String str)
Randomly upcases, downcases, or leaves intact each code point in the given stringstatic Pattern
randomPattern(Random random)
Returns a valid (compiling) Pattern instance with random stuff inside.static String
randomRealisticUnicodeString(Random r)
Returns random string of length between 0-20 codepoints, all codepoints within the same unicode block.static String
randomRealisticUnicodeString(Random r, int maxLength)
Returns random string of length up to maxLength codepoints , all codepoints within the same unicode block.static String
randomRealisticUnicodeString(Random r, int minLength, int maxLength)
Returns random string of length between min and max codepoints, all codepoints within the same unicode block.static String
randomRegexpishString(Random r)
Returns a String thats "regexpish" (contains lots of operators typically found in regular expressions) If you call this enough times, you might get a valid regex!static String
randomRegexpishString(Random r, int maxLength)
Returns a String thats "regexpish" (contains lots of operators typically found in regular expressions) If you call this enough times, you might get a valid regex!static String
randomSimpleString(Random r)
static String
randomSimpleString(Random r, int maxLength)
static String
randomSimpleString(Random r, int minLength, int maxLength)
static String
randomSimpleStringRange(Random r, char minChar, char maxChar, int maxLength)
static String
randomSubString(Random random, int wordLength, boolean simple)
static String
randomUnicodeString(Random r)
Returns random string, including full unicode range.static String
randomUnicodeString(Random r, int maxLength)
Returns a random string up to a certain length.static void
reduceOpenFiles(IndexWriter w)
just tries to configure things to keep the open file count lowishstatic void
shutdownExecutorService(ExecutorService ex)
ShutdownExecutorService
and wait for its.static CharSequence
stringToCharSequence(String string, Random random)
static void
syncConcurrentMerges(IndexWriter writer)
static void
syncConcurrentMerges(MergeScheduler ms)
static void
unzip(InputStream in, Path destDir)
Convenience method unzipping zipName into destDir.
-
-
-
Field Detail
-
STRING_CODEPOINT_COMPARATOR
public static final Comparator<CharSequence> STRING_CODEPOINT_COMPARATOR
A comparator that compares UTF-16 strings / char sequences according to Unicode code point order. This can be used to verifyBytesRef
order.Warning: This comparator is rather inefficient, because it converts the strings to a
int[]
array on each invocation.
-
-
Method Detail
-
unzip
public static void unzip(InputStream in, Path destDir) throws IOException
Convenience method unzipping zipName into destDir. You must pass it a clean destDir.Closes the given InputStream after extracting!
- Throws:
IOException
-
checkIterator
public static <T> void checkIterator(Iterator<T> iterator, long expectedSize, boolean allowNull)
Checks that the provided iterator is well-formed.- is read-only: does not allow
remove
- returns
expectedSize
number of elements - does not return null elements, unless
allowNull
is true. - throws NoSuchElementException if
next
is called afterhasNext
returns false.
- is read-only: does not allow
-
checkIterator
public static <T> void checkIterator(Iterator<T> iterator)
Checks that the provided iterator is well-formed.- is read-only: does not allow
remove
- does not return null elements.
- throws NoSuchElementException if
next
is called afterhasNext
returns false.
- is read-only: does not allow
-
checkReadOnly
public static <T> void checkReadOnly(Collection<T> coll)
Checks that the provided collection is read-only.- See Also:
checkIterator(Iterator)
-
syncConcurrentMerges
public static void syncConcurrentMerges(IndexWriter writer)
-
syncConcurrentMerges
public static void syncConcurrentMerges(MergeScheduler ms)
-
checkIndex
public static CheckIndex.Status checkIndex(Directory dir) throws IOException
This runs the CheckIndex tool on the index in. If any issues are hit, a RuntimeException is thrown; else, true is returned.- Throws:
IOException
-
checkIndex
public static CheckIndex.Status checkIndex(Directory dir, boolean doSlowChecks) throws IOException
- Throws:
IOException
-
checkIndex
public static CheckIndex.Status checkIndex(Directory dir, boolean doSlowChecks, boolean failFast, boolean concurrent, ByteArrayOutputStream output) throws IOException
If failFast is true, then throw the first exception when index corruption is hit, instead of moving on to other fields/segments to look for any other corruption.- Throws:
IOException
-
checkReader
public static void checkReader(IndexReader reader) throws IOException
This runs the CheckIndex tool on the Reader. If any issues are hit, a RuntimeException is thrown- Throws:
IOException
-
checkReader
public static void checkReader(LeafReader reader, boolean doSlowChecks) throws IOException
- Throws:
IOException
-
nextInt
public static int nextInt(Random r, int start, int end)
start and end are BOTH inclusive
-
nextLong
public static long nextLong(Random r, long start, long end)
start and end are BOTH inclusive
-
nextBigInteger
public static BigInteger nextBigInteger(Random random, int maxBytes)
Returns a randomish big integer with1 .. maxBytes
storage.
-
randomSimpleStringRange
public static String randomSimpleStringRange(Random r, char minChar, char maxChar, int maxLength)
-
randomUnicodeString
public static String randomUnicodeString(Random r)
Returns random string, including full unicode range.
-
randomUnicodeString
public static String randomUnicodeString(Random r, int maxLength)
Returns a random string up to a certain length.
-
randomFixedLengthUnicodeString
public static void randomFixedLengthUnicodeString(Random random, char[] chars, int offset, int length)
Fills provided char[] with valid random unicode code unit sequence.
-
randomRegexpishString
public static String randomRegexpishString(Random r)
Returns a String thats "regexpish" (contains lots of operators typically found in regular expressions) If you call this enough times, you might get a valid regex!
-
randomRegexpishString
public static String randomRegexpishString(Random r, int maxLength)
Returns a String thats "regexpish" (contains lots of operators typically found in regular expressions) If you call this enough times, you might get a valid regex!Note: to avoid practically endless backtracking patterns we replace asterisk and plus operators with bounded repetitions. See LUCENE-4111 for more info.
- Parameters:
maxLength
- A hint about maximum length of the regexpish string. It may be exceeded by a few characters.
-
randomlyRecaseCodePoints
public static String randomlyRecaseCodePoints(Random random, String str)
Randomly upcases, downcases, or leaves intact each code point in the given string
-
randomRealisticUnicodeString
public static String randomRealisticUnicodeString(Random r)
Returns random string of length between 0-20 codepoints, all codepoints within the same unicode block.
-
randomRealisticUnicodeString
public static String randomRealisticUnicodeString(Random r, int maxLength)
Returns random string of length up to maxLength codepoints , all codepoints within the same unicode block.
-
randomRealisticUnicodeString
public static String randomRealisticUnicodeString(Random r, int minLength, int maxLength)
Returns random string of length between min and max codepoints, all codepoints within the same unicode block.
-
randomFixedByteLengthUnicodeString
public static String randomFixedByteLengthUnicodeString(Random r, int length)
Returns random string, with a given UTF-8 byte length
-
randomBinaryTerm
public static BytesRef randomBinaryTerm(Random r, int length)
Returns a random binary with a given length
-
alwaysPostingsFormat
public static Codec alwaysPostingsFormat(PostingsFormat format)
Return a Codec that can read any of the default codecs and formats, but always writes in the specified format.
-
alwaysDocValuesFormat
public static Codec alwaysDocValuesFormat(DocValuesFormat format)
Return a Codec that can read any of the default codecs and formats, but always writes in the specified format.
-
getDefaultCodec
public static Codec getDefaultCodec()
Returns the actual default codec (e.g. LuceneMNCodec) for this version of Lucene. This may be different thanCodec.getDefault()
because that is randomized.
-
getDefaultPostingsFormat
public static PostingsFormat getDefaultPostingsFormat()
Returns the actual default postings format (e.g. LuceneMNPostingsFormat for this version of Lucene.
-
getDefaultPostingsFormat
public static PostingsFormat getDefaultPostingsFormat(int minItemsPerBlock, int maxItemsPerBlock)
Returns the actual default postings format (e.g. LuceneMNPostingsFormat for this version of Lucene.- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
- this may disappear at any time
-
getPostingsFormatWithOrds
public static PostingsFormat getPostingsFormatWithOrds(Random r)
Returns a random postings format that supports term ordinals
-
getDefaultDocValuesFormat
public static DocValuesFormat getDefaultDocValuesFormat()
Returns the actual default docvalues format (e.g. LuceneMNDocValuesFormat for this version of Lucene.
-
fieldSupportsHugeBinaryDocValues
public static boolean fieldSupportsHugeBinaryDocValues(String field)
-
getDefaultKnnVectorsFormat
public static KnnVectorsFormat getDefaultKnnVectorsFormat()
Returns the actual default vector format (e.g. LuceneMNKnnVectorsFormat for this version of Lucene.
-
anyFilesExceptWriteLock
public static boolean anyFilesExceptWriteLock(Directory dir) throws IOException
- Throws:
IOException
-
addIndexesSlowly
public static void addIndexesSlowly(IndexWriter writer, DirectoryReader... readers) throws IOException
- Throws:
IOException
-
reduceOpenFiles
public static void reduceOpenFiles(IndexWriter w)
just tries to configure things to keep the open file count lowish
-
assertAttributeReflection
public static <T> void assertAttributeReflection(AttributeImpl att, Map<String,T> reflectedValues)
Checks some basic behaviour of an AttributeImpl- Parameters:
reflectedValues
- contains a map with "AttributeClass#key" as values
-
assertConsistent
public static void assertConsistent(TopDocs expected, TopDocs actual)
Assert that the givenTopDocs
have the same top docs and consistent hit counts.
-
docs
public static PostingsEnum docs(Random random, IndexReader r, String field, BytesRef term, PostingsEnum reuse, int flags) throws IOException
- Throws:
IOException
-
docs
public static PostingsEnum docs(Random random, TermsEnum termsEnum, PostingsEnum reuse, int flags) throws IOException
- Throws:
IOException
-
stringToCharSequence
public static CharSequence stringToCharSequence(String string, Random random)
-
bytesToCharSequence
public static CharSequence bytesToCharSequence(BytesRef ref, Random random)
-
shutdownExecutorService
public static void shutdownExecutorService(ExecutorService ex)
ShutdownExecutorService
and wait for its.
-
randomPattern
public static Pattern randomPattern(Random random)
Returns a valid (compiling) Pattern instance with random stuff inside. Be careful when applying random patterns to longer strings as certain types of patterns may explode into exponential times in backtracking implementations (such as Java's).
-
randomAnalysisString
public static String randomAnalysisString(Random random, int maxLength, boolean simple)
-
randomSubString
public static String randomSubString(Random random, int wordLength, boolean simple)
-
bytesRefToString
public static String bytesRefToString(BytesRef br)
For debugging: tries to include br.utf8ToString(), but if that fails (because it's not valid utf8, which is fine!), just use ordinary toString.
-
ramCopyOf
public static Directory ramCopyOf(Directory dir) throws IOException
Returns a copy of the source directory, with file contents stored in RAM.- Throws:
IOException
-
hasWindowsFS
public static boolean hasWindowsFS(Directory dir)
-
hasWindowsFS
public static boolean hasWindowsFS(Path path)
-
hasVirusChecker
public static boolean hasVirusChecker(Directory dir)
-
hasVirusChecker
public static boolean hasVirusChecker(Path path)
-
disableVirusChecker
public static boolean disableVirusChecker(Directory in)
Returns true if VirusCheckingFS is in use and was in fact already enabled
-
enableVirusChecker
public static void enableVirusChecker(Directory in)
-
-