Package org.apache.lucene.tests.util
Class TestUtil
java.lang.Object
org.apache.lucene.tests.util.TestUtil
General utility methods for Lucene unit tests.
-
Field Summary
Modifier and TypeFieldDescriptionstatic final Comparator
<CharSequence> A comparator that compares UTF-16 strings / char sequences according to Unicode code point order. -
Method Summary
Modifier and TypeMethodDescriptionstatic void
addIndexesSlowly
(IndexWriter writer, DirectoryReader... readers) static Codec
alwaysDocValuesFormat
(DocValuesFormat format) Return a Codec that can read any of the default codecs and formats, but always writes in the specified format.static Codec
alwaysPostingsFormat
(PostingsFormat format) Return a Codec that can read any of the default codecs and formats, but always writes in the specified format.static boolean
static <T> void
assertAttributeReflection
(AttributeImpl att, Map<String, T> reflectedValues) Checks some basic behaviour of an AttributeImplstatic void
assertConsistent
(TopDocs expected, TopDocs actual) Assert that the givenTopDocs
have the same top docs and consistent hit counts.static String
For debugging: tries to include br.utf8ToString(), but if that fails (because it's not valid utf8, which is fine!), just use ordinary toString.static CharSequence
bytesToCharSequence
(BytesRef ref, Random random) static CheckIndex.Status
checkIndex
(Directory dir) This runs the CheckIndex tool on the index in.static CheckIndex.Status
checkIndex
(Directory dir, boolean doSlowChecks) static CheckIndex.Status
checkIndex
(Directory dir, boolean doSlowChecks, boolean failFast, boolean concurrent, ByteArrayOutputStream output) If failFast is true, then throw the first exception when index corruption is hit, instead of moving on to other fields/segments to look for any other corruption.static <T> void
checkIterator
(Iterator<T> iterator) Checks that the provided iterator is well-formed.static <T> void
checkIterator
(Iterator<T> iterator, long expectedSize, boolean allowNull) Checks that the provided iterator is well-formed.static void
checkReader
(IndexReader reader) This runs the CheckIndex tool on the Reader.static void
checkReader
(LeafReader reader, boolean doSlowChecks) static <T> void
checkReadOnly
(Collection<T> coll) Checks that the provided collection is read-only.static Document
cloneDocument
(Document doc1) static boolean
Returns true if VirusCheckingFS is in use and was in fact already enabledstatic PostingsEnum
docs
(Random random, IndexReader r, String field, BytesRef term, PostingsEnum reuse, int flags) static PostingsEnum
docs
(Random random, TermsEnum termsEnum, PostingsEnum reuse, int flags) static void
static boolean
static Codec
Returns the actual default codec (e.g.static DocValuesFormat
Returns the actual default docvalues format (e.g.static KnnVectorsFormat
Returns the actual default vector format (e.g.static PostingsFormat
Returns the actual default postings format (e.g.static PostingsFormat
getDefaultPostingsFormat
(int minItemsPerBlock, int maxItemsPerBlock) Returns the actual default postings format (e.g.static String
getDocValuesFormat
(String field) static String
getDocValuesFormat
(Codec codec, String field) static String
getPostingsFormat
(String field) static String
getPostingsFormat
(Codec codec, String field) static PostingsFormat
Returns a random postings format that supports term ordinalsstatic boolean
hasVirusChecker
(Path path) static boolean
hasVirusChecker
(Directory dir) static boolean
hasWindowsFS
(Path path) static boolean
hasWindowsFS
(Directory dir) static BigInteger
nextBigInteger
(Random random, int maxBytes) Returns a randomish big integer with1 .. maxBytes
storage.static int
start and end are BOTH inclusivestatic long
start and end are BOTH inclusivestatic Directory
Returns a copy of the source directory, with file contents stored in RAM.static String
randomAnalysisString
(Random random, int maxLength, boolean simple) static BytesRef
Returns a random binary term.static BytesRef
randomBinaryTerm
(Random r, int length) Returns a random binary with a given lengthstatic String
randomFixedByteLengthUnicodeString
(Random r, int length) Returns random string, with a given UTF-8 byte lengthstatic void
randomFixedLengthUnicodeString
(Random random, char[] chars, int offset, int length) Fills provided char[] with valid random unicode code unit sequence.static String
randomHtmlishString
(Random random, int numElements) static String
randomlyRecaseCodePoints
(Random random, String str) Randomly upcases, downcases, or leaves intact each code point in the given stringstatic Pattern
randomPattern
(Random random) Returns a valid (compiling) Pattern instance with random stuff inside.static String
Returns random string of length between 0-20 codepoints, all codepoints within the same unicode block.static String
randomRealisticUnicodeString
(Random r, int maxLength) Returns random string of length up to maxLength codepoints , all codepoints within the same unicode block.static String
randomRealisticUnicodeString
(Random r, int minLength, int maxLength) Returns random string of length between min and max codepoints, all codepoints within the same unicode block.static String
Returns a String thats "regexpish" (contains lots of operators typically found in regular expressions) If you call this enough times, you might get a valid regex!static String
randomRegexpishString
(Random r, int maxLength) Returns a String thats "regexpish" (contains lots of operators typically found in regular expressions) If you call this enough times, you might get a valid regex!static String
static String
randomSimpleString
(Random r, int maxLength) static String
randomSimpleString
(Random r, int minLength, int maxLength) static String
randomSimpleStringRange
(Random r, char minChar, char maxChar, int maxLength) static String
randomSubString
(Random random, int wordLength, boolean simple) static String
Returns random string, including full unicode range.static String
randomUnicodeString
(Random r, int maxLength) Returns a random string up to a certain length.static void
just tries to configure things to keep the open file count lowishstatic void
ShutdownExecutorService
and wait for its.static CharSequence
stringToCharSequence
(String string, Random random) static void
syncConcurrentMerges
(IndexWriter writer) static void
static void
unzip
(InputStream in, Path destDir) Convenience method unzipping zipName into destDir.
-
Field Details
-
STRING_CODEPOINT_COMPARATOR
A comparator that compares UTF-16 strings / char sequences according to Unicode code point order. This can be used to verifyBytesRef
order.Warning: This comparator is rather inefficient, because it converts the strings to a
int[]
array on each invocation.
-
-
Method Details
-
unzip
Convenience method unzipping zipName into destDir. You must pass it a clean destDir.Closes the given InputStream after extracting!
- Throws:
IOException
-
checkIterator
Checks that the provided iterator is well-formed.- is read-only: does not allow
remove
- returns
expectedSize
number of elements - does not return null elements, unless
allowNull
is true. - throws NoSuchElementException if
next
is called afterhasNext
returns false.
- is read-only: does not allow
-
checkIterator
Checks that the provided iterator is well-formed.- is read-only: does not allow
remove
- does not return null elements.
- throws NoSuchElementException if
next
is called afterhasNext
returns false.
- is read-only: does not allow
-
checkReadOnly
Checks that the provided collection is read-only.- See Also:
-
syncConcurrentMerges
-
syncConcurrentMerges
-
checkIndex
This runs the CheckIndex tool on the index in. If any issues are hit, a RuntimeException is thrown; else, true is returned.- Throws:
IOException
-
checkIndex
- Throws:
IOException
-
checkIndex
public static CheckIndex.Status checkIndex(Directory dir, boolean doSlowChecks, boolean failFast, boolean concurrent, ByteArrayOutputStream output) throws IOException If failFast is true, then throw the first exception when index corruption is hit, instead of moving on to other fields/segments to look for any other corruption.- Throws:
IOException
-
checkReader
This runs the CheckIndex tool on the Reader. If any issues are hit, a RuntimeException is thrown- Throws:
IOException
-
checkReader
- Throws:
IOException
-
nextInt
start and end are BOTH inclusive -
nextLong
start and end are BOTH inclusive -
nextBigInteger
Returns a randomish big integer with1 .. maxBytes
storage. -
randomSimpleString
-
randomSimpleString
-
randomSimpleStringRange
-
randomSimpleString
-
randomUnicodeString
Returns random string, including full unicode range. -
randomUnicodeString
Returns a random string up to a certain length. -
randomFixedLengthUnicodeString
public static void randomFixedLengthUnicodeString(Random random, char[] chars, int offset, int length) Fills provided char[] with valid random unicode code unit sequence. -
randomRegexpishString
Returns a String thats "regexpish" (contains lots of operators typically found in regular expressions) If you call this enough times, you might get a valid regex! -
randomRegexpishString
Returns a String thats "regexpish" (contains lots of operators typically found in regular expressions) If you call this enough times, you might get a valid regex!Note: to avoid practically endless backtracking patterns we replace asterisk and plus operators with bounded repetitions. See LUCENE-4111 for more info.
- Parameters:
maxLength
- A hint about maximum length of the regexpish string. It may be exceeded by a few characters.
-
randomHtmlishString
-
randomlyRecaseCodePoints
Randomly upcases, downcases, or leaves intact each code point in the given string -
randomRealisticUnicodeString
Returns random string of length between 0-20 codepoints, all codepoints within the same unicode block. -
randomRealisticUnicodeString
Returns random string of length up to maxLength codepoints , all codepoints within the same unicode block. -
randomRealisticUnicodeString
Returns random string of length between min and max codepoints, all codepoints within the same unicode block. -
randomFixedByteLengthUnicodeString
Returns random string, with a given UTF-8 byte length -
randomBinaryTerm
Returns a random binary term. -
randomBinaryTerm
Returns a random binary with a given length -
alwaysPostingsFormat
Return a Codec that can read any of the default codecs and formats, but always writes in the specified format. -
alwaysDocValuesFormat
Return a Codec that can read any of the default codecs and formats, but always writes in the specified format. -
getDefaultCodec
Returns the actual default codec (e.g. LuceneMNCodec) for this version of Lucene. This may be different thanCodec.getDefault()
because that is randomized. -
getDefaultPostingsFormat
Returns the actual default postings format (e.g. LuceneMNPostingsFormat for this version of Lucene. -
getDefaultPostingsFormat
Returns the actual default postings format (e.g. LuceneMNPostingsFormat for this version of Lucene.- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
- this may disappear at any time
-
getPostingsFormatWithOrds
Returns a random postings format that supports term ordinals -
getDefaultDocValuesFormat
Returns the actual default docvalues format (e.g. LuceneMNDocValuesFormat for this version of Lucene. -
getPostingsFormat
-
getPostingsFormat
-
getDocValuesFormat
-
getDocValuesFormat
-
fieldSupportsHugeBinaryDocValues
-
getDefaultKnnVectorsFormat
Returns the actual default vector format (e.g. LuceneMNKnnVectorsFormat for this version of Lucene. -
anyFilesExceptWriteLock
- Throws:
IOException
-
addIndexesSlowly
public static void addIndexesSlowly(IndexWriter writer, DirectoryReader... readers) throws IOException - Throws:
IOException
-
reduceOpenFiles
just tries to configure things to keep the open file count lowish -
assertAttributeReflection
Checks some basic behaviour of an AttributeImpl- Parameters:
reflectedValues
- contains a map with "AttributeClass#key" as values
-
assertConsistent
Assert that the givenTopDocs
have the same top docs and consistent hit counts. -
cloneDocument
-
docs
public static PostingsEnum docs(Random random, IndexReader r, String field, BytesRef term, PostingsEnum reuse, int flags) throws IOException - Throws:
IOException
-
docs
public static PostingsEnum docs(Random random, TermsEnum termsEnum, PostingsEnum reuse, int flags) throws IOException - Throws:
IOException
-
stringToCharSequence
-
bytesToCharSequence
-
shutdownExecutorService
ShutdownExecutorService
and wait for its. -
randomPattern
Returns a valid (compiling) Pattern instance with random stuff inside. Be careful when applying random patterns to longer strings as certain types of patterns may explode into exponential times in backtracking implementations (such as Java's). -
randomAnalysisString
-
randomSubString
-
bytesRefToString
For debugging: tries to include br.utf8ToString(), but if that fails (because it's not valid utf8, which is fine!), just use ordinary toString. -
ramCopyOf
Returns a copy of the source directory, with file contents stored in RAM.- Throws:
IOException
-
hasWindowsFS
-
hasWindowsFS
-
hasVirusChecker
-
hasVirusChecker
-
disableVirusChecker
Returns true if VirusCheckingFS is in use and was in fact already enabled -
enableVirusChecker
-