Package org.apache.lucene.util.fst
package org.apache.lucene.util.fst
Finite state transducers
This package implements Finite State Transducers with the following characteristics:
- Fast and low memory overhead construction of the minimal FST (but inputs must be provided in sorted order)
- Low object overhead and quick deserialization (byte[] representation)
- Pluggable
Outputs
representation N-shortest-paths
search by weight- Enumerators (
IntsRef
andBytesRef
) that behave likeSortedMap
iterators
FST Construction example:
// Input values (keys). These must be provided to Builder in Unicode code point (UTF8 or UTF32) sorted order. // Note that sorting by Java's String.compareTo, which is UTF16 sorted order, is not correct and can lead to // exceptions while building the FST: String inputValues[] = {"cat", "dog", "dogs"}; long outputValues[] = {5, 7, 12}; PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton(); FSTCompiler<Long> fstCompiler = new FSTCompiler<Long>(INPUT_TYPE.BYTE1, outputs); BytesRefBuilder scratchBytes = new BytesRefBuilder(); IntsRefBuilder scratchInts = new IntsRefBuilder(); for (int i = 0; i < inputValues.length; i++) { scratchBytes.copyChars(inputValues[i]); fstCompiler.add(Util.toIntsRef(scratchBytes.toBytesRef(), scratchInts), outputValues[i]); } FST<Long> fst = fstCompiler.compile();Retrieval by key:
Long value = Util.get(fst, new BytesRef("dog")); System.out.println(value); // 7Retrieval by value:
// Only works because outputs are also in sorted order IntsRef key = Util.getByOutput(fst, 12); System.out.println(Util.toBytesRef(key, scratchBytes).utf8ToString()); // dogsIterate over key-value pairs in sorted order:
// Like TermsEnum, this also supports seeking (advance) BytesRefFSTEnum<Long> iterator = new BytesRefFSTEnum<Long>(fst); while (iterator.next() != null) { InputOutput<Long> mapEntry = iterator.current(); System.out.println(mapEntry.input.utf8ToString()); System.out.println(mapEntry.output); }N-shortest paths by weight:
Comparator<Long> comparator = new Comparator<Long>() { public int compare(Long left, Long right) { return left.compareTo(right); } }; Arc<Long> firstArc = fst.getFirstArc(new Arc<Long>()); Util.TopResults<Long> paths = Util.shortestPaths(fst, firstArc, fst.outputs.getNoOutput(), comparator, 3, true); System.out.println(Util.toBytesRef(paths.topN.get(0).input, scratchBytes).utf8ToString()); // cat System.out.println(paths.topN.get(0).output); // 5 System.out.println(Util.toBytesRef(paths.topN.get(1).input, scratchBytes).utf8ToString()); // dog System.out.println(paths.topN.get(1).output); // 7
-
ClassDescriptionAn FST
Outputs
implementation where each output is a sequence of bytes.Enumerates all input (BytesRef) + output pairs in an FST.Holds a single input (BytesRef) + output pair.An FSTOutputs
implementation where each output is a sequence of characters.FST<T>Represents an finite state machine (FST), using a compact byte[] format.FST.Arc<T>Represents a single arc.Reads bytes stored in an FST.Specifies allowed range of each int input label for this FST.FSTCompiler<T>Builds a minimal FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with outputs.Fluent-style constructor for FSTFSTCompiler
.Abstraction for reading/writing bytes necessary for FST.An FSTOutputs
implementation where each output is a sequence of ints.Enumerates all input (IntsRef) + output pairs in an FST.Holds a single input (IntsRef) + output pair.A null FSTOutputs
implementation; use this if you just want to build an FSA.Provides off heap storage of finite state machine (FST), using underlying index input instead of byte store on heapProvides storage of finite state machine (FST), using byte array or byte store allocated on heap.Outputs<T>Represents the outputs for an FST, providing the basic algebra required for building and traversing the FST.PairOutputs<A,B> An FSTOutputs
implementation, holding two other outputs.PairOutputs.Pair<A,B> Holds a single pair of two outputs.An FSTOutputs
implementation where each output is a non-negative long value.Static helper methods.Util.FSTPath<T>Represents a path in TopNSearcher.Util.Result<T>Holds a single input (IntsRef) + output, returned byshortestPaths()
.Utility class to find top N shortest paths from start point(s).Holds the results for a top N search usingUtil.TopNSearcher