Package org.apache.lucene.util.fst

Finite state transducers

This package implements Finite State Transducers with the following characteristics:

  • Fast and low memory overhead construction of the minimal FST (but inputs must be provided in sorted order)
  • Low object overhead and quick deserialization (byte[] representation)
  • Pluggable Outputs representation
  • N-shortest-paths search by weight
  • Enumerators (IntsRef and BytesRef) that behave like SortedMap iterators

FST Construction example:

     // Input values (keys). These must be provided to Builder in Unicode code point (UTF8 or UTF32) sorted order.
     // Note that sorting by Java's String.compareTo, which is UTF16 sorted order, is not correct and can lead to
     // exceptions while building the FST:
     String inputValues[] = {"cat", "dog", "dogs"};
     long outputValues[] = {5, 7, 12};

     PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
     FSTCompiler<Long> fstCompiler = new FSTCompiler<Long>(INPUT_TYPE.BYTE1, outputs);
     BytesRefBuilder scratchBytes = new BytesRefBuilder();
     IntsRefBuilder scratchInts = new IntsRefBuilder();
     for (int i = 0; i < inputValues.length; i++) {
       fstCompiler.add(Util.toIntsRef(scratchBytes.toBytesRef(), scratchInts), outputValues[i]);
     FST<Long> fst = fstCompiler.compile();
Retrieval by key:
     Long value = Util.get(fst, new BytesRef("dog"));
     System.out.println(value); // 7
Retrieval by value:
     // Only works because outputs are also in sorted order
     IntsRef key = Util.getByOutput(fst, 12);
     System.out.println(Util.toBytesRef(key, scratchBytes).utf8ToString()); // dogs
Iterate over key-value pairs in sorted order:
     // Like TermsEnum, this also supports seeking (advance)
     BytesRefFSTEnum<Long> iterator = new BytesRefFSTEnum<Long>(fst);
     while ( != null) {
       InputOutput<Long> mapEntry = iterator.current();
N-shortest paths by weight:
     Comparator<Long> comparator = new Comparator<Long>() {
       public int compare(Long left, Long right) {
         return left.compareTo(right);
     Arc<Long> firstArc = fst.getFirstArc(new Arc<Long>());
     Util.TopResults<Long> paths = Util.shortestPaths(fst, firstArc, fst.outputs.getNoOutput(), comparator, 3, true);
     System.out.println(Util.toBytesRef(paths.topN.get(0).input, scratchBytes).utf8ToString()); // cat
     System.out.println(paths.topN.get(0).output); // 5
     System.out.println(Util.toBytesRef(paths.topN.get(1).input, scratchBytes).utf8ToString()); // dog
     System.out.println(paths.topN.get(1).output); // 7