Misc Tools

The misc package has various tools for splitting/merging indices, changing norms, finding high freq terms, and others.

NativeUnixDirectory

NOTE: This uses C++ sources (accessible via JNI), which you'll have to compile on your platform.

NativeUnixDirectory is a Directory implementation that bypasses the OS's buffer cache (using direct IO) for any IndexInput and IndexOutput used during merging of segments larger than a specified size (default 10 MB). This avoids evicting hot pages that are still in-use for searching, keeping search more responsive while large merges run.

See this blog post for details. Steps to build:

  • cd lucene/misc/
  • To compile NativePosixUtil.cpp -> libNativePosixUtil.so, run ant build-native-unix.
  • libNativePosixUtil.so will be located in the lucene/build/native/ folder
  • Make sure libNativePosixUtil.so is on your LD_LIBRARY_PATH so java can find it (something like export LD_LIBRARY_PATH=/path/to/dir:$LD_LIBRARY_PATH, where /path/to/dir contains libNativePosixUtil.so)
  • ant jar to compile the java source and put that JAR on your CLASSPATH

NativePosixUtil.cpp/java also expose access to the posix_madvise, madvise, posix_fadvise functions, which are somewhat more cross platform than O_DIRECT, however, in testing (see above link), these APIs did not seem to help prevent buffer cache eviction.

Packages 
Package Description
org.apache.lucene.document
Misc extensions of the Document/Field API.
org.apache.lucene.index
Misc index tools and index support.
org.apache.lucene.misc
Miscellaneous index tools.
org.apache.lucene.search
Misc search implementations.
org.apache.lucene.search.similarity
Misc similarity implementations.
org.apache.lucene.store
Misc Directory implementations.
org.apache.lucene.util
Memory Tracker interface which allows defining custom collector level memory trackers
org.apache.lucene.util.fst
Misc FST classes.