Class FSTCompiler.Builder<T>

  • Enclosing class:
    FSTCompiler<T>

    public static class FSTCompiler.Builder<T>
    extends Object
    Fluent-style constructor for FST FSTCompiler.

    Creates an FST/FSA builder with all the possible tuning and construction tweaks. Read parameter documentation carefully.

    • Method Detail

      • minSuffixCount1

        public FSTCompiler.Builder<T> minSuffixCount1​(int minSuffixCount1)
        If pruning the input graph during construction, this threshold is used for telling if a node is kept or pruned. If transition_count(node) >= minSuffixCount1, the node is kept.

        Default = 0.

      • minSuffixCount2

        public FSTCompiler.Builder<T> minSuffixCount2​(int minSuffixCount2)
        Better pruning: we prune node (and all following nodes) if the prior node has less than this number of terms go through it.

        Default = 0.

      • shouldShareSuffix

        public FSTCompiler.Builder<T> shouldShareSuffix​(boolean shouldShareSuffix)
        If true, the shared suffixes will be compacted into unique paths. This requires an additional RAM-intensive hash map for lookups in memory. Setting this parameter to false creates a single suffix path for all input sequences. This will result in a larger FST, but requires substantially less memory and CPU during building.

        Default = true.

      • shouldShareNonSingletonNodes

        public FSTCompiler.Builder<T> shouldShareNonSingletonNodes​(boolean shouldShareNonSingletonNodes)
        Only used if shouldShareSuffix is true. Set this to true to ensure FST is fully minimal, at cost of more CPU and more RAM during building.

        Default = true.

      • shareMaxTailLength

        public FSTCompiler.Builder<T> shareMaxTailLength​(int shareMaxTailLength)
        Only used if shouldShareSuffix is true. Set this to Integer.MAX_VALUE to ensure FST is fully minimal, at cost of more CPU and more RAM during building.

        Default = Integer.MAX_VALUE.

      • allowFixedLengthArcs

        public FSTCompiler.Builder<T> allowFixedLengthArcs​(boolean allowFixedLengthArcs)
        Pass false to disable the fixed length arc optimization (binary search or direct addressing) while building the FST; this will make the resulting FST smaller but slower to traverse.

        Default = true.

      • bytesPageBits

        public FSTCompiler.Builder<T> bytesPageBits​(int bytesPageBits)
        How many bits wide to make each byte[] block in the BytesStore; if you know the FST will be large then make this larger. For example 15 bits = 32768 byte pages.

        Default = 15.

      • directAddressingMaxOversizingFactor

        public FSTCompiler.Builder<T> directAddressingMaxOversizingFactor​(float factor)
        Overrides the default the maximum oversizing of fixed array allowed to enable direct addressing of arcs instead of binary search.

        Setting this factor to a negative value (e.g. -1) effectively disables direct addressing, only binary search nodes will be created.

        This factor does not determine whether to encode a node with a list of variable length arcs or with fixed length arcs. It only determines the effective encoding of a node that is already known to be encoded with fixed length arcs.

        Default = 1.