Misc Tools

The misc package has various tools for splitting/merging indices, changing norms, finding high freq terms, and others.

NativeUnixDirectory

NOTE: This uses C++ sources (accessible via JNI), which you'll have to compile on your platform.

NativeUnixDirectory is a Directory implementation that bypasses the OS's buffer cache (using direct IO) for any IndexInput and IndexOutput used during merging of segments larger than a specified size (default 10 MB). This avoids evicting hot pages that are still in-use for searching, keeping search more responsive while large merges run.

See this blog post for details. Steps to build:

  • cd lucene/misc/
  • To compile NativePosixUtil.cpp -> libNativePosixUtil.so, run ant build-native-unix.
  • libNativePosixUtil.so will be located in the lucene/build/native/ folder
  • Make sure libNativePosixUtil.so is on your LD_LIBRARY_PATH so java can find it (something like export LD_LIBRARY_PATH=/path/to/dir:$LD_LIBRARY_PATH, where /path/to/dir contains libNativePosixUtil.so)
  • ant jar to compile the java source and put that JAR on your CLASSPATH

NativePosixUtil.cpp/java also expose access to the posix_madvise, madvise, posix_fadvise functions, which are somewhat more cross platform than O_DIRECT, however, in testing (see above link), these APIs did not seem to help prevent buffer cache eviction.

Packages 
Package Description
org.apache.lucene.document
Misc extensions of the Document/Field API.
org.apache.lucene.index
Misc index tools and index support.
org.apache.lucene.misc
Miscellaneous index tools.
org.apache.lucene.search
Misc search implementations.
org.apache.lucene.store
Misc Directory implementations.
org.apache.lucene.util.fst
Misc FST classes.