org.apache.lucene.util.fst
Class Builder<T>

java.lang.Object
  extended by org.apache.lucene.util.fst.Builder<T>

public class Builder<T>
extends Object

Builds a compact FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with outputs (the FST becomes an FSA if you use NoOutputs). The FST is written on-the-fly into a compact serialized format byte array, which can be saved to / loaded from a Directory or used directly for traversal. The FST is always finite (no cycles).

NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698

If your outputs are ByteSequenceOutput then the final FST will be minimal, but if you use PositiveIntOutput then it's only "near minimal". For example, aa/0, aab/1, bbb/2 will produce 6 states when a 5 state fst is also possible. The parameterized type T is the output type. See the subclasses of Outputs.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Constructor Summary
Builder(FST.INPUT_TYPE inputType, int minSuffixCount1, int minSuffixCount2, boolean doMinSuffix, Outputs<T> outputs)
           
Builder(FST.INPUT_TYPE inputType, Outputs<T> outputs)
          Instantiates an FST/FSA builder without any pruning.
 
Method Summary
 void add(BytesRef input, T output)
           
 void add(char[] s, int offset, int length, T output)
          Sugar: adds the UTF32 codepoints from char[] slice.
 void add(CharSequence s, T output)
          Sugar: adds the UTF32 codepoints from CharSequence.
 void add(IntsRef input, T output)
          It's OK to add the same input twice in a row with different outputs, as long as outputs impls the merge method.
 FST<T> finish()
          Returns final FST.
 int getMappedStateCount()
           
 long getTermCount()
           
 int getTotStateCount()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Builder

public Builder(FST.INPUT_TYPE inputType,
               Outputs<T> outputs)
Instantiates an FST/FSA builder without any pruning. A shortcut to Builder(FST.INPUT_TYPE, int, int, boolean, Outputs) with pruning options turned off.


Builder

public Builder(FST.INPUT_TYPE inputType,
               int minSuffixCount1,
               int minSuffixCount2,
               boolean doMinSuffix,
               Outputs<T> outputs)
Method Detail

getTotStateCount

public int getTotStateCount()

getTermCount

public long getTermCount()

getMappedStateCount

public int getMappedStateCount()

add

public void add(BytesRef input,
                T output)
         throws IOException
Throws:
IOException

add

public void add(char[] s,
                int offset,
                int length,
                T output)
         throws IOException
Sugar: adds the UTF32 codepoints from char[] slice. FST must be FST.INPUT_TYPE.BYTE4!

Throws:
IOException

add

public void add(CharSequence s,
                T output)
         throws IOException
Sugar: adds the UTF32 codepoints from CharSequence. FST must be FST.INPUT_TYPE.BYTE4!

Throws:
IOException

add

public void add(IntsRef input,
                T output)
         throws IOException
It's OK to add the same input twice in a row with different outputs, as long as outputs impls the merge method.

Throws:
IOException

finish

public FST<T> finish()
              throws IOException
Returns final FST. NOTE: this will return null if nothing is accepted by the FST.

Throws:
IOException


Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.