Package org.apache.lucene.util.fst

Finite state transducers


Class Summary
Builder<T> Builds a minimal FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with outputs.
Builder.Arc<T> Expert: holds a pending (seen but not yet serialized) arc.
Builder.FreezeTail<T> Expert: this is invoked by Builder whenever a suffix is serialized.
Builder.UnCompiledNode<T> Expert: holds a pending (seen but not yet serialized) Node.
ByteSequenceOutputs An FST Outputs implementation where each output is a sequence of bytes.
BytesRefFSTEnum<T> Enumerates all input (BytesRef) + output pairs in an FST.
BytesRefFSTEnum.InputOutput<T> Holds a single input (BytesRef) + output pair.
CharSequenceOutputs An FST Outputs implementation where each output is a sequence of characters.
FST<T> Represents an finite state machine (FST), using a compact byte[] format.
FST.Arc<T> Represents a single arc.
FST.BytesReader Reads bytes stored in an FST.
IntSequenceOutputs An FST Outputs implementation where each output is a sequence of ints.
IntsRefFSTEnum<T> Enumerates all input (IntsRef) + output pairs in an FST.
IntsRefFSTEnum.InputOutput<T> Holds a single input (IntsRef) + output pair.
NoOutputs A null FST Outputs implementation; use this if you just want to build an FSA.
Outputs<T> Represents the outputs for an FST, providing the basic algebra required for building and traversing the FST.
PairOutputs<A,B> An FST Outputs implementation, holding two other outputs.
PairOutputs.Pair<A,B> Holds a single pair of two outputs.
PositiveIntOutputs An FST Outputs implementation where each output is a non-negative long value.
Util Static helper methods.
Util.MinResult<T> Holds a single input (IntsRef) + output, returned by shortestPaths().
Util.TopNSearcher<T> Utility class to find top N shortest paths from start point(s).

Enum Summary
FST.INPUT_TYPE Specifies allowed range of each int input label for this FST.

Package org.apache.lucene.util.fst Description

Finite state transducers

This package implements Finite State Transducers with the following characteristics:

FST Construction example:

    // Input values (keys). These must be provided to Builder in Unicode sorted order!
    String inputValues[] = {"cat", "dog", "dogs"};
    long outputValues[] = {5, 7, 12};
    PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
    Builder<Long> builder = new Builder<Long>(INPUT_TYPE.BYTE1, outputs);
    BytesRef scratchBytes = new BytesRef();
    IntsRef scratchInts = new IntsRef();
    for (int i = 0; i < inputValues.length; i++) {
      builder.add(Util.toIntsRef(scratchBytes, scratchInts), outputValues[i]);
    FST<Long> fst = builder.finish();
Retrieval by key:
    Long value = Util.get(fst, new BytesRef("dog"));
    System.out.println(value); // 7
Retrieval by value:
    // Only works because outputs are also in sorted order
    IntsRef key = Util.getByOutput(fst, 12);
    System.out.println(Util.toBytesRef(key, scratchBytes).utf8ToString()); // dogs
Iterate over key-value pairs in sorted order:
    // Like TermsEnum, this also supports seeking (advance)
    BytesRefFSTEnum<Long> iterator = new BytesRefFSTEnum<Long>(fst);
    while ( != null) {
      InputOutput<Long> mapEntry = iterator.current();
N-shortest paths by weight:
    Comparator<Long> comparator = new Comparator<Long>() {
      public int compare(Long left, Long right) {
        return left.compareTo(right);
    Arc<Long> firstArc = fst.getFirstArc(new Arc<Long>());
    MinResult<Long> paths[] = Util.shortestPaths(fst, firstArc, comparator, 2);
    System.out.println(Util.toBytesRef(paths[0].input, scratchBytes).utf8ToString()); // cat
    System.out.println(paths[0].output); // 5
    System.out.println(Util.toBytesRef(paths[1].input, scratchBytes).utf8ToString()); // dog
    System.out.println(paths[1].output); // 7

Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.