Package org.apache.lucene.util.fst
Class FSTCompiler.Builder<T>
java.lang.Object
org.apache.lucene.util.fst.FSTCompiler.Builder<T>
- Enclosing class:
- FSTCompiler<T>
Fluent-style constructor for FST
FSTCompiler
.
Creates an FST/FSA builder with all the possible tuning and construction tweaks. Read parameter documentation carefully.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionallowFixedLengthArcs
(boolean allowFixedLengthArcs) Passfalse
to disable the fixed length arc optimization (binary search or direct addressing) while building the FST; this will make the resulting FST smaller but slower to traverse.build()
Creates a newFSTCompiler
.bytesPageBits
(int bytesPageBits) How many bits wide to make each byte[] block in the BytesStore; if you know the FST will be large then make this larger.directAddressingMaxOversizingFactor
(float factor) Overrides the default the maximum oversizing of fixed array allowed to enable direct addressing of arcs instead of binary search.minSuffixCount1
(int minSuffixCount1) If pruning the input graph during construction, this threshold is used for telling if a node is kept or pruned.minSuffixCount2
(int minSuffixCount2) Better pruning: we prune node (and all following nodes) if the prior node has less than this number of terms go through it.shareMaxTailLength
(int shareMaxTailLength) Only used ifshouldShareSuffix
is true.shouldShareNonSingletonNodes
(boolean shouldShareNonSingletonNodes) Only used ifshouldShareSuffix
is true.shouldShareSuffix
(boolean shouldShareSuffix) Iftrue
, the shared suffixes will be compacted into unique paths.
-
Constructor Details
-
Builder
- Parameters:
inputType
- The input type (transition labels). Can be anything fromFST.INPUT_TYPE
enumeration. Shorter types will consume less memory. Strings (character sequences) are represented asFST.INPUT_TYPE.BYTE4
(full unicode codepoints).outputs
- The output type for each input sequence. Applies only if building an FST. For FSA, useNoOutputs.getSingleton()
andNoOutputs.getNoOutput()
as the singleton output object.
-
-
Method Details
-
minSuffixCount1
If pruning the input graph during construction, this threshold is used for telling if a node is kept or pruned. If transition_count(node) >= minSuffixCount1, the node is kept.Default = 0.
-
minSuffixCount2
Better pruning: we prune node (and all following nodes) if the prior node has less than this number of terms go through it.Default = 0.
-
allowFixedLengthArcs
Passfalse
to disable the fixed length arc optimization (binary search or direct addressing) while building the FST; this will make the resulting FST smaller but slower to traverse.Default =
true
. -
bytesPageBits
How many bits wide to make each byte[] block in the BytesStore; if you know the FST will be large then make this larger. For example 15 bits = 32768 byte pages.Default = 15.
-
directAddressingMaxOversizingFactor
Overrides the default the maximum oversizing of fixed array allowed to enable direct addressing of arcs instead of binary search.Setting this factor to a negative value (e.g. -1) effectively disables direct addressing, only binary search nodes will be created.
This factor does not determine whether to encode a node with a list of variable length arcs or with fixed length arcs. It only determines the effective encoding of a node that is already known to be encoded with fixed length arcs.
Default = 1.
-
build
Creates a newFSTCompiler
.
-