Class FSTCompiler.Builder<T>
- java.lang.Object
-
- org.apache.lucene.util.fst.FSTCompiler.Builder<T>
-
- Enclosing class:
- FSTCompiler<T>
public static class FSTCompiler.Builder<T> extends Object
Fluent-style constructor for FSTFSTCompiler
.Creates an FST/FSA builder with all the possible tuning and construction tweaks. Read parameter documentation carefully.
-
-
Constructor Summary
Constructors Constructor Description Builder(FST.INPUT_TYPE inputType, Outputs<T> outputs)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description FSTCompiler.Builder<T>
allowFixedLengthArcs(boolean allowFixedLengthArcs)
Passfalse
to disable the fixed length arc optimization (binary search or direct addressing) while building the FST; this will make the resulting FST smaller but slower to traverse.FSTCompiler<T>
build()
Creates a newFSTCompiler
.FSTCompiler.Builder<T>
dataOutput(DataOutput dataOutput)
Set theDataOutput
which is used for low-level writing of FST.FSTCompiler.Builder<T>
directAddressingMaxOversizingFactor(float factor)
Overrides the default the maximum oversizing of fixed array allowed to enable direct addressing of arcs instead of binary search.FSTCompiler.Builder<T>
setVersion(int version)
Expert: Set the codec version.FSTCompiler.Builder<T>
suffixRAMLimitMB(double mb)
The approximate maximum amount of RAM (in MB) to use holding the suffix cache, which enables the FST to share common suffixes.
-
-
-
Constructor Detail
-
Builder
public Builder(FST.INPUT_TYPE inputType, Outputs<T> outputs)
- Parameters:
inputType
- The input type (transition labels). Can be anything fromFST.INPUT_TYPE
enumeration. Shorter types will consume less memory. Strings (character sequences) are represented asFST.INPUT_TYPE.BYTE4
(full unicode codepoints).outputs
- The output type for each input sequence. Applies only if building an FST. For FSA, useNoOutputs.getSingleton()
andNoOutputs.getNoOutput()
as the singleton output object.
-
-
Method Detail
-
suffixRAMLimitMB
public FSTCompiler.Builder<T> suffixRAMLimitMB(double mb)
The approximate maximum amount of RAM (in MB) to use holding the suffix cache, which enables the FST to share common suffixes. PassDouble.POSITIVE_INFINITY
to keep all suffixes and create an exactly minimal FST. In this case, the amount of RAM actually used will be bounded by the number of unique suffixes. If you pass a value smaller than the builder would use, the least recently used suffixes will be discarded, thus reducing suffix sharing and creating a non-minimal FST. In this case, the larger the limit, the closer the FST will be to its true minimal size, with diminishing returns as you increase the limit. Pass0
to disable suffix sharing entirely, but note that the resulting FST can be substantially larger than the minimal FST.Note that this is not a precise limit. The current implementation uses hash tables to map the suffixes, and approximates the rough overhead (unused slots) in the hash table.
Default =
32.0
MB.
-
allowFixedLengthArcs
public FSTCompiler.Builder<T> allowFixedLengthArcs(boolean allowFixedLengthArcs)
Passfalse
to disable the fixed length arc optimization (binary search or direct addressing) while building the FST; this will make the resulting FST smaller but slower to traverse.Default =
true
.
-
dataOutput
public FSTCompiler.Builder<T> dataOutput(DataOutput dataOutput)
Set theDataOutput
which is used for low-level writing of FST. If you want the FST to be immediately readable, you need to useFSTCompiler.getOnHeapReaderWriter(int)
.Otherwise you need to construct the corresponding
DataInput
and use the FST constructor to read it.- Parameters:
dataOutput
- the DataOutput- Returns:
- this builder
- See Also:
FSTCompiler.getOnHeapReaderWriter(int)
-
directAddressingMaxOversizingFactor
public FSTCompiler.Builder<T> directAddressingMaxOversizingFactor(float factor)
Overrides the default the maximum oversizing of fixed array allowed to enable direct addressing of arcs instead of binary search.Setting this factor to a negative value (e.g. -1) effectively disables direct addressing, only binary search nodes will be created.
This factor does not determine whether to encode a node with a list of variable length arcs or with fixed length arcs. It only determines the effective encoding of a node that is already known to be encoded with fixed length arcs.
Default = 1.
-
setVersion
public FSTCompiler.Builder<T> setVersion(int version)
Expert: Set the codec version. *
-
build
public FSTCompiler<T> build()
Creates a newFSTCompiler
.
-
-