org.apache.lucene.util.fst.FSTCompiler<T>

public class FSTCompiler<T> extends Object

Builds a minimal FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with outputs. The FST becomes an FSA if you use NoOutputs. The FST is written on-the-fly into a compact serialized format byte array, which can be saved to / loaded from a Directory or used directly for traversal. The FST is always finite (no cycles).

NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698

The parameterized type T is the output type. See the subclasses of Outputs.

FSTs larger than 2.1GB are now possible (as of Lucene 4.2). FSTs containing more than 2.1B nodes are also now possible, however they cannot be packed.

It now supports 3 different workflows:

- Build FST and use it immediately entirely in RAM and then discard it

- Build FST and use it immediately entirely in RAM and also save it to other DataOutput, and load it later and use it

- Build FST but stream it immediately to disk (except the FSTMetaData, to be saved at the end). In order to use it, you need to construct the corresponding DataInput and use the FST constructor to read it.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

FSTCompiler.Builder<T>

Fluent-style constructor for FST FSTCompiler.
Method Summary

Modifier and Type

Method

Description

void

add(IntsRef input, T output)

Add the next input/output pair.

FST.FSTMetadata<T>

compile()

Returns the metadata of the final FST.

long

fstRamBytesUsed()

long

fstSizeInBytes()

long

getArcCount()

float

getDirectAddressingMaxOversizingFactor()

FSTReader

getFSTReader()

Get the respective FSTReader of the DataOutput.

long

getNodeCount()

static DataOutput

getOnHeapReaderWriter(int blockBits)

Get an on-heap DataOutput that allows the FST to be read immediately after writing, and also optionally saved to an external DataOutput.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- getOnHeapReaderWriter
  
  public static DataOutput getOnHeapReaderWriter(int blockBits)
  
  Get an on-heap DataOutput that allows the FST to be read immediately after writing, and also optionally saved to an external DataOutput.
  
  Parameters:
  
  blockBits - how many bits wide to make each block of the DataOutput
  
  Returns:
  
  the DataOutput
- getFSTReader
  
  public FSTReader getFSTReader()
  
  Get the respective FSTReader of the DataOutput. To call this method, you need to use the default DataOutput or getOnHeapReaderWriter(int), otherwise we will throw an exception.
  
  Returns:
  
  the DataOutput as FSTReader
  
  Throws:
  
  IllegalStateException - if the DataOutput does not implement FSTReader
- getDirectAddressingMaxOversizingFactor
  
  public float getDirectAddressingMaxOversizingFactor()
- getNodeCount
  
  public long getNodeCount()
- getArcCount
  
  public long getArcCount()
- add
  
  public void add(IntsRef input, T output) throws IOException
  
  Add the next input/output pair. The provided input must be sorted after the previous one according to IntsRef.compareTo(org.apache.lucene.util.IntsRef). It's also OK to add the same input twice in a row with different outputs, as long as Outputs implements the Outputs.merge(T, T) method. Note that input is fully consumed after this method is returned (so caller is free to reuse), but output is not. So if your outputs are changeable (eg ByteSequenceOutputs or IntSequenceOutputs) then you cannot reuse across calls.
  
  Throws:
  
  IOException
- compile
  
  public FST.FSTMetadata<T> compile() throws IOException
  Returns the metadata of the final FST. NOTE: this will return null if nothing is accepted by the FST themselves.
  To create the FST, you need to:
  - If a FSTReader DataOutput was used, such as the one returned by getOnHeapReaderWriter(int)
  fstMetadata = fstCompiler.compile(); fst = FST.fromFSTReader(fstMetadata, fstCompiler.getFSTReader());
  
  - If a non-FSTReader DataOutput was used, such as IndexOutput, you need to first create the corresponding DataInput, such as IndexInput then pass it to the FST construct
  fstMetadata = fstCompiler.compile(); fst = new FST<>(fstMetadata, dataInput, new OffHeapFSTStore());
  Throws:
  
  IOException
- fstRamBytesUsed
  
  public long fstRamBytesUsed()
- fstSizeInBytes
  
  public long fstSizeInBytes()

Class FSTCompiler<T>

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Details

getOnHeapReaderWriter

getFSTReader

getDirectAddressingMaxOversizingFactor

getNodeCount

getArcCount

add

compile

fstRamBytesUsed

fstSizeInBytes