Package org.apache.lucene.util.fst
Class FSTCompiler<T>
java.lang.Object
org.apache.lucene.util.fst.FSTCompiler<T>
Builds a minimal FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with
outputs. The FST becomes an FSA if you use NoOutputs. The FST is written on-the-fly into a
compact serialized format byte array, which can be saved to / loaded from a Directory or used
directly for traversal. The FST is always finite (no cycles).
NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698
The parameterized type T is the output type. See the subclasses of Outputs
.
FSTs larger than 2.1GB are now possible (as of Lucene 4.2). FSTs containing more than 2.1B nodes are also now possible, however they cannot be packed.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
-
Constructor Summary
ConstructorDescriptionFSTCompiler
(FST.INPUT_TYPE inputType, Outputs<T> outputs) Instantiates an FST/FSA builder with default settings and pruning options turned off. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Add the next input/output pair.compile()
Returns final FST.long
long
float
long
long
long
-
Constructor Details
-
FSTCompiler
Instantiates an FST/FSA builder with default settings and pruning options turned off. For more tuning and tweaking, seeFSTCompiler.Builder
.
-
-
Method Details
-
getDirectAddressingMaxOversizingFactor
public float getDirectAddressingMaxOversizingFactor() -
getTermCount
public long getTermCount() -
getNodeCount
public long getNodeCount() -
getArcCount
public long getArcCount() -
getMappedStateCount
public long getMappedStateCount() -
add
Add the next input/output pair. The provided input must be sorted after the previous one according toIntsRef.compareTo(org.apache.lucene.util.IntsRef)
. It's also OK to add the same input twice in a row with different outputs, as long asOutputs
implements theOutputs.merge(T, T)
method. Note that input is fully consumed after this method is returned (so caller is free to reuse), but output is not. So if your outputs are changeable (egByteSequenceOutputs
orIntSequenceOutputs
) then you cannot reuse across calls.- Throws:
IOException
-
compile
Returns final FST. NOTE: this will return null if nothing is accepted by the FST.- Throws:
IOException
-
fstRamBytesUsed
public long fstRamBytesUsed()
-