Package org.apache.lucene.util.fst
Class FST<T>
- java.lang.Object
-
- org.apache.lucene.util.fst.FST<T>
-
- All Implemented Interfaces:
Accountable
public final class FST<T> extends Object implements Accountable
Represents an finite state machine (FST), using a compact byte[] format.The format is similar to what's used by Morfologik (https://github.com/morfologik/morfologik-stemming).
See the
package documentation
for some simple examples.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
FST.Arc<T>
Represents a single arc.static class
FST.BytesReader
Reads bytes stored in an FST.static class
FST.FSTMetadata<T>
Represents the FST metadata.static class
FST.INPUT_TYPE
Specifies allowed range of each int input label for this FST.
-
Field Summary
Fields Modifier and Type Field Description static byte
ARCS_FOR_BINARY_SEARCH
Value of the arc flags to declare a node with fixed length (sparse) arcs designed for binary search.static int
BIT_ARC_HAS_OUTPUT
This flag is set if the arc has an output.static int
END_LABEL
If arc has this label then that arc is final/acceptedOutputs<T>
outputs
static int
VERSION_90
Version that was used when releasing Lucene 9.0.static int
VERSION_CONTINUOUS_ARCS
Version that started storing continuous arcs.static int
VERSION_CURRENT
Current version.static int
VERSION_START
First supported version, this is the version that was used when releasing Lucene 7.0.-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Constructor Description FST(FST.FSTMetadata<T> metadata, DataInput in)
Load a previously saved FST with a DataInput for metdata using anOnHeapFSTStore
with maxBlockBits set toDEFAULT_MAX_BLOCK_BITS
FST(FST.FSTMetadata<T> metadata, DataInput in, FSTStore fstStore)
Load a previously saved FST with a metdata object and a FSTStore.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description FST.Arc<T>
findTargetArc(int labelToMatch, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
Finds an arc leaving the incoming arc, replacing the arc in place.static <T> FST<T>
fromFSTReader(FST.FSTMetadata<T> fstMetadata, FSTReader fstReader)
Create a FST from aFSTReader
.FST.BytesReader
getBytesReader()
Returns aFST.BytesReader
for this FST, positioned at position 0.T
getEmptyOutput()
FST.Arc<T>
getFirstArc(FST.Arc<T> arc)
Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start nodeFST.FSTMetadata<T>
getMetadata()
long
numBytes()
long
ramBytesUsed()
Return the memory usage of this object in bytes.static <T> FST<T>
read(Path path, Outputs<T> outputs)
Reads an automaton from a file.FST.Arc<T>
readArcByContinuous(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex)
Reads a Continuous node arc, with the provided index in the label range.FST.Arc<T>
readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex)
Reads a present direct addressing node arc, with the provided index in the label range.FST.Arc<T>
readArcByIndex(FST.Arc<T> arc, FST.BytesReader in, int idx)
FST.Arc<T>
readFirstRealTargetArc(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in)
FST.Arc<T>
readFirstTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
Follow thefollow
arc and read the first arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.int
readLabel(DataInput in)
Reads one BYTE1/2/4 label from the providedDataInput
.FST.Arc<T>
readLastArcByContinuous(FST.Arc<T> arc, FST.BytesReader in)
Reads the last arc of a continuous node.FST.Arc<T>
readLastArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in)
Reads the last arc of a direct addressing node.static <T> FST.FSTMetadata<T>
readMetadata(DataInput metaIn, Outputs<T> outputs)
Read the FST metadata from DataInputFST.Arc<T>
readNextArc(FST.Arc<T> arc, FST.BytesReader in)
In-place read; returns the arc.FST.Arc<T>
readNextRealArc(FST.Arc<T> arc, FST.BytesReader in)
Never returns null, but you should never call this if arc.isLast() is true.void
save(Path path)
Writes an automaton to a file.void
save(DataOutput metaOut, DataOutput out)
Save the FST to DataOutput.void
saveMetadata(DataOutput metaOut)
Save the metadata to a DataOutputstatic <T> boolean
targetHasArcs(FST.Arc<T> arc)
returns true if the node at this address has any outgoing arcsString
toString()
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
-
-
-
Field Detail
-
BIT_ARC_HAS_OUTPUT
public static final int BIT_ARC_HAS_OUTPUT
This flag is set if the arc has an output.- See Also:
- Constant Field Values
-
ARCS_FOR_BINARY_SEARCH
public static final byte ARCS_FOR_BINARY_SEARCH
Value of the arc flags to declare a node with fixed length (sparse) arcs designed for binary search.- See Also:
- Constant Field Values
-
VERSION_START
public static final int VERSION_START
First supported version, this is the version that was used when releasing Lucene 7.0.- See Also:
- Constant Field Values
-
VERSION_CONTINUOUS_ARCS
public static final int VERSION_CONTINUOUS_ARCS
Version that started storing continuous arcs.- See Also:
- Constant Field Values
-
VERSION_CURRENT
public static final int VERSION_CURRENT
Current version.- See Also:
- Constant Field Values
-
VERSION_90
public static final int VERSION_90
Version that was used when releasing Lucene 9.0.- See Also:
- Constant Field Values
-
END_LABEL
public static final int END_LABEL
If arc has this label then that arc is final/accepted- See Also:
- Constant Field Values
-
-
Constructor Detail
-
FST
public FST(FST.FSTMetadata<T> metadata, DataInput in) throws IOException
Load a previously saved FST with a DataInput for metdata using anOnHeapFSTStore
with maxBlockBits set toDEFAULT_MAX_BLOCK_BITS
- Throws:
IOException
-
FST
public FST(FST.FSTMetadata<T> metadata, DataInput in, FSTStore fstStore) throws IOException
Load a previously saved FST with a metdata object and a FSTStore. If usingOnHeapFSTStore
, setting maxBlockBits allows you to control the size of the byte[] pages used to hold the FST bytes.- Throws:
IOException
-
-
Method Detail
-
fromFSTReader
public static <T> FST<T> fromFSTReader(FST.FSTMetadata<T> fstMetadata, FSTReader fstReader)
Create a FST from aFSTReader
. Return null if the metadata is null.- Parameters:
fstMetadata
- the metadatafstReader
- the FSTReader- Returns:
- the FST
-
readMetadata
public static <T> FST.FSTMetadata<T> readMetadata(DataInput metaIn, Outputs<T> outputs) throws IOException
Read the FST metadata from DataInput- Type Parameters:
T
- the output type- Parameters:
metaIn
- the DataInput of the metadataoutputs
- the FST outputs- Returns:
- the FST metadata
- Throws:
IOException
- if exception occurred during parsing
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
numBytes
public long numBytes()
-
getEmptyOutput
public T getEmptyOutput()
-
getMetadata
public FST.FSTMetadata<T> getMetadata()
-
save
public void save(DataOutput metaOut, DataOutput out) throws IOException
Save the FST to DataOutput.- Parameters:
metaOut
- the DataOutput to write the metadata toout
- the DataOutput to write the FST bytes to- Throws:
IOException
-
saveMetadata
public void saveMetadata(DataOutput metaOut) throws IOException
Save the metadata to a DataOutput- Parameters:
metaOut
- the DataOutput to write the metadata to- Throws:
IOException
-
save
public void save(Path path) throws IOException
Writes an automaton to a file.- Throws:
IOException
-
read
public static <T> FST<T> read(Path path, Outputs<T> outputs) throws IOException
Reads an automaton from a file.- Throws:
IOException
-
readLabel
public int readLabel(DataInput in) throws IOException
Reads one BYTE1/2/4 label from the providedDataInput
.- Throws:
IOException
-
targetHasArcs
public static <T> boolean targetHasArcs(FST.Arc<T> arc)
returns true if the node at this address has any outgoing arcs
-
getFirstArc
public FST.Arc<T> getFirstArc(FST.Arc<T> arc)
Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start node
-
readFirstTargetArc
public FST.Arc<T> readFirstTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws IOException
Follow thefollow
arc and read the first arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.- Returns:
- Returns the second argument (
arc
). - Throws:
IOException
-
readFirstRealTargetArc
public FST.Arc<T> readFirstRealTargetArc(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in) throws IOException
- Throws:
IOException
-
readNextArc
public FST.Arc<T> readNextArc(FST.Arc<T> arc, FST.BytesReader in) throws IOException
In-place read; returns the arc.- Throws:
IOException
-
readArcByIndex
public FST.Arc<T> readArcByIndex(FST.Arc<T> arc, FST.BytesReader in, int idx) throws IOException
- Throws:
IOException
-
readArcByContinuous
public FST.Arc<T> readArcByContinuous(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex) throws IOException
Reads a Continuous node arc, with the provided index in the label range.- Parameters:
rangeIndex
- The index of the arc in the label range. It must be within the label range.- Throws:
IOException
-
readArcByDirectAddressing
public FST.Arc<T> readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex) throws IOException
Reads a present direct addressing node arc, with the provided index in the label range.- Parameters:
rangeIndex
- The index of the arc in the label range. It must be present. The real arc offset is computed based on the presence bits of the direct addressing node.- Throws:
IOException
-
readLastArcByDirectAddressing
public FST.Arc<T> readLastArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in) throws IOException
Reads the last arc of a direct addressing node. This method is equivalent to callreadArcByDirectAddressing(Arc, BytesReader, int)
withrangeIndex
equal toarc.numArcs() - 1
, but it is faster.- Throws:
IOException
-
readLastArcByContinuous
public FST.Arc<T> readLastArcByContinuous(FST.Arc<T> arc, FST.BytesReader in) throws IOException
Reads the last arc of a continuous node.- Throws:
IOException
-
readNextRealArc
public FST.Arc<T> readNextRealArc(FST.Arc<T> arc, FST.BytesReader in) throws IOException
Never returns null, but you should never call this if arc.isLast() is true.- Throws:
IOException
-
findTargetArc
public FST.Arc<T> findTargetArc(int labelToMatch, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws IOException
Finds an arc leaving the incoming arc, replacing the arc in place. This returns null if the arc was not found, else the incoming arc.- Throws:
IOException
-
getBytesReader
public FST.BytesReader getBytesReader()
Returns aFST.BytesReader
for this FST, positioned at position 0.
-
-