Package org.apache.lucene.util.fst
Class FSTCompiler.Builder<T>
java.lang.Object
org.apache.lucene.util.fst.FSTCompiler.Builder<T>
- Enclosing class:
FSTCompiler<T>
Fluent-style constructor for FST
FSTCompiler
.
Creates an FST/FSA builder with all the possible tuning and construction tweaks. Read parameter documentation carefully.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionallowFixedLengthArcs
(boolean allowFixedLengthArcs) Passfalse
to disable the fixed length arc optimization (binary search or direct addressing) while building the FST; this will make the resulting FST smaller but slower to traverse.build()
Creates a newFSTCompiler
.bytesPageBits
(int bytesPageBits) How many bits wide to make each byte[] block in the BytesStore; if you know the FST will be large then make this larger.directAddressingMaxOversizingFactor
(float factor) Overrides the default the maximum oversizing of fixed array allowed to enable direct addressing of arcs instead of binary search.setVersion
(int version) Expert: Set the codec version.suffixRAMLimitMB
(double mb) The approximate maximum amount of RAM (in MB) to use holding the suffix cache, which enables the FST to share common suffixes.
-
Constructor Details
-
Builder
- Parameters:
inputType
- The input type (transition labels). Can be anything fromFST.INPUT_TYPE
enumeration. Shorter types will consume less memory. Strings (character sequences) are represented asFST.INPUT_TYPE.BYTE4
(full unicode codepoints).outputs
- The output type for each input sequence. Applies only if building an FST. For FSA, useNoOutputs.getSingleton()
andNoOutputs.getNoOutput()
as the singleton output object.
-
-
Method Details
-
suffixRAMLimitMB
The approximate maximum amount of RAM (in MB) to use holding the suffix cache, which enables the FST to share common suffixes. PassDouble.POSITIVE_INFINITY
to keep all suffixes and create an exactly minimal FST. In this case, the amount of RAM actually used will be bounded by the number of unique suffixes. If you pass a value smaller than the builder would use, the least recently used suffixes will be discarded, thus reducing suffix sharing and creating a non-minimal FST. In this case, the larger the limit, the closer the FST will be to its true minimal size, with diminishing returns as you increase the limit. Pass0
to disable suffix sharing entirely, but note that the resulting FST can be substantially larger than the minimal FST.Note that this is not a precise limit. The current implementation uses hash tables to map the suffixes, and approximates the rough overhead (unused slots) in the hash table.
Default =
32.0
MB. -
allowFixedLengthArcs
Passfalse
to disable the fixed length arc optimization (binary search or direct addressing) while building the FST; this will make the resulting FST smaller but slower to traverse.Default =
true
. -
bytesPageBits
How many bits wide to make each byte[] block in the BytesStore; if you know the FST will be large then make this larger. For example 15 bits = 32768 byte pages.Default = 15.
-
directAddressingMaxOversizingFactor
Overrides the default the maximum oversizing of fixed array allowed to enable direct addressing of arcs instead of binary search.Setting this factor to a negative value (e.g. -1) effectively disables direct addressing, only binary search nodes will be created.
This factor does not determine whether to encode a node with a list of variable length arcs or with fixed length arcs. It only determines the effective encoding of a node that is already known to be encoded with fixed length arcs.
Default = 1.
-
setVersion
Expert: Set the codec version. * -
build
Creates a newFSTCompiler
.- Throws:
IOException
-