Package org.apache.lucene.util.fst
Class FSTCompiler.Builder<T>
- java.lang.Object
-
- org.apache.lucene.util.fst.FSTCompiler.Builder<T>
-
- Enclosing class:
- FSTCompiler<T>
public static class FSTCompiler.Builder<T> extends Object
Fluent-style constructor for FSTFSTCompiler
.Creates an FST/FSA builder with all the possible tuning and construction tweaks. Read parameter documentation carefully.
-
-
Constructor Summary
Constructors Constructor Description Builder(FST.INPUT_TYPE inputType, Outputs<T> outputs)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description FSTCompiler.Builder<T>
allowFixedLengthArcs(boolean allowFixedLengthArcs)
Passfalse
to disable the fixed length arc optimization (binary search or direct addressing) while building the FST; this will make the resulting FST smaller but slower to traverse.FSTCompiler<T>
build()
Creates a newFSTCompiler
.FSTCompiler.Builder<T>
bytesPageBits(int bytesPageBits)
How many bits wide to make each byte[] block in the BytesStore; if you know the FST will be large then make this larger.FSTCompiler.Builder<T>
directAddressingMaxOversizingFactor(float factor)
Overrides the default the maximum oversizing of fixed array allowed to enable direct addressing of arcs instead of binary search.FSTCompiler.Builder<T>
minSuffixCount1(int minSuffixCount1)
If pruning the input graph during construction, this threshold is used for telling if a node is kept or pruned.FSTCompiler.Builder<T>
minSuffixCount2(int minSuffixCount2)
Better pruning: we prune node (and all following nodes) if the prior node has less than this number of terms go through it.FSTCompiler.Builder<T>
shareMaxTailLength(int shareMaxTailLength)
Only used ifshouldShareSuffix
is true.FSTCompiler.Builder<T>
shouldShareNonSingletonNodes(boolean shouldShareNonSingletonNodes)
Only used ifshouldShareSuffix
is true.FSTCompiler.Builder<T>
shouldShareSuffix(boolean shouldShareSuffix)
Iftrue
, the shared suffixes will be compacted into unique paths.
-
-
-
Constructor Detail
-
Builder
public Builder(FST.INPUT_TYPE inputType, Outputs<T> outputs)
- Parameters:
inputType
- The input type (transition labels). Can be anything fromFST.INPUT_TYPE
enumeration. Shorter types will consume less memory. Strings (character sequences) are represented asFST.INPUT_TYPE.BYTE4
(full unicode codepoints).outputs
- The output type for each input sequence. Applies only if building an FST. For FSA, useNoOutputs.getSingleton()
andNoOutputs.getNoOutput()
as the singleton output object.
-
-
Method Detail
-
minSuffixCount1
public FSTCompiler.Builder<T> minSuffixCount1(int minSuffixCount1)
If pruning the input graph during construction, this threshold is used for telling if a node is kept or pruned. If transition_count(node) >= minSuffixCount1, the node is kept.Default = 0.
-
minSuffixCount2
public FSTCompiler.Builder<T> minSuffixCount2(int minSuffixCount2)
Better pruning: we prune node (and all following nodes) if the prior node has less than this number of terms go through it.Default = 0.
-
shouldShareSuffix
public FSTCompiler.Builder<T> shouldShareSuffix(boolean shouldShareSuffix)
Iftrue
, the shared suffixes will be compacted into unique paths. This requires an additional RAM-intensive hash map for lookups in memory. Setting this parameter tofalse
creates a single suffix path for all input sequences. This will result in a larger FST, but requires substantially less memory and CPU during building.Default =
true
.
-
shouldShareNonSingletonNodes
public FSTCompiler.Builder<T> shouldShareNonSingletonNodes(boolean shouldShareNonSingletonNodes)
Only used ifshouldShareSuffix
is true. Set this to true to ensure FST is fully minimal, at cost of more CPU and more RAM during building.Default =
true
.
-
shareMaxTailLength
public FSTCompiler.Builder<T> shareMaxTailLength(int shareMaxTailLength)
Only used ifshouldShareSuffix
is true. Set this to Integer.MAX_VALUE to ensure FST is fully minimal, at cost of more CPU and more RAM during building.Default =
Integer.MAX_VALUE
.
-
allowFixedLengthArcs
public FSTCompiler.Builder<T> allowFixedLengthArcs(boolean allowFixedLengthArcs)
Passfalse
to disable the fixed length arc optimization (binary search or direct addressing) while building the FST; this will make the resulting FST smaller but slower to traverse.Default =
true
.
-
bytesPageBits
public FSTCompiler.Builder<T> bytesPageBits(int bytesPageBits)
How many bits wide to make each byte[] block in the BytesStore; if you know the FST will be large then make this larger. For example 15 bits = 32768 byte pages.Default = 15.
-
directAddressingMaxOversizingFactor
public FSTCompiler.Builder<T> directAddressingMaxOversizingFactor(float factor)
Overrides the default the maximum oversizing of fixed array allowed to enable direct addressing of arcs instead of binary search.Setting this factor to a negative value (e.g. -1) effectively disables direct addressing, only binary search nodes will be created.
This factor does not determine whether to encode a node with a list of variable length arcs or with fixed length arcs. It only determines the effective encoding of a node that is already known to be encoded with fixed length arcs.
Default = 1.
-
build
public FSTCompiler<T> build()
Creates a newFSTCompiler
.
-
-