Class FSTCompiler<T>


  • public class FSTCompiler<T>
    extends Object
    Builds a minimal FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with outputs. The FST becomes an FSA if you use NoOutputs. The FST is written on-the-fly into a compact serialized format byte array, which can be saved to / loaded from a Directory or used directly for traversal. The FST is always finite (no cycles).

    NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698

    The parameterized type T is the output type. See the subclasses of Outputs.

    FSTs larger than 2.1GB are now possible (as of Lucene 4.2). FSTs containing more than 2.1B nodes are also now possible, however they cannot be packed.

    It now supports 3 different workflows:

    - Build FST and use it immediately entirely in RAM and then discard it

    - Build FST and use it immediately entirely in RAM and also save it to other DataOutput, and load it later and use it

    - Build FST but stream it immediately to disk (except the FSTMetaData, to be saved at the end). In order to use it, you need to construct the corresponding DataInput and use the FST constructor to read it.

    WARNING: This API is experimental and might change in incompatible ways in the next release.
    • Method Detail

      • getOnHeapReaderWriter

        public static DataOutput getOnHeapReaderWriter​(int blockBits)
        Get an on-heap DataOutput that allows the FST to be read immediately after writing, and also optionally saved to an external DataOutput.
        Parameters:
        blockBits - how many bits wide to make each block of the DataOutput
        Returns:
        the DataOutput
      • getDirectAddressingMaxOversizingFactor

        public float getDirectAddressingMaxOversizingFactor()
      • getNodeCount

        public long getNodeCount()
      • getArcCount

        public long getArcCount()
      • compile

        public FST.FSTMetadata<T> compile()
                                   throws IOException
        Returns the metadata of the final FST. NOTE: this will return null if nothing is accepted by the FST themselves.

        To create the FST, you need to:

        - If a FSTReader DataOutput was used, such as the one returned by getOnHeapReaderWriter(int)

             fstMetadata = fstCompiler.compile();
             fst = FST.fromFSTReader(fstMetadata, fstCompiler.getFSTReader());
         

        - If a non-FSTReader DataOutput was used, such as IndexOutput, you need to first create the corresponding DataInput, such as IndexInput then pass it to the FST construct

             fstMetadata = fstCompiler.compile();
             fst = new FST<>(fstMetadata, dataInput, new OffHeapFSTStore());
         
        Throws:
        IOException
      • fstRamBytesUsed

        public long fstRamBytesUsed()
      • fstSizeInBytes

        public long fstSizeInBytes()