|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.util.fst.Builder<T>
public class Builder<T>
Builds a compact FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with outputs (the FST becomes an FSA if you use NoOutputs). The FST is written on-the-fly into a compact serialized format byte array, which can be saved to / loaded from a Directory or used directly for traversal. The FST is always finite (no cycles).
NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698
If your outputs are ByteSequenceOutput then the final FST will be minimal, but if you use PositiveIntOutput then it's only "near minimal". For example, aa/0, aab/1, bbb/2 will produce 6 states when a 5 state fst is also possible. The parameterized type T is the output type. See the subclasses ofOutputs
.
Constructor Summary | |
---|---|
Builder(FST.INPUT_TYPE inputType,
int minSuffixCount1,
int minSuffixCount2,
boolean doMinSuffix,
Outputs<T> outputs)
|
|
Builder(FST.INPUT_TYPE inputType,
Outputs<T> outputs)
Instantiates an FST/FSA builder without any pruning. |
Method Summary | |
---|---|
void |
add(BytesRef input,
T output)
|
void |
add(char[] s,
int offset,
int length,
T output)
Sugar: adds the UTF32 codepoints from char[] slice. |
void |
add(CharSequence s,
T output)
Sugar: adds the UTF32 codepoints from CharSequence. |
void |
add(IntsRef input,
T output)
It's OK to add the same input twice in a row with different outputs, as long as outputs impls the merge method. |
FST<T> |
finish()
Returns final FST. |
int |
getMappedStateCount()
|
long |
getTermCount()
|
int |
getTotStateCount()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Builder(FST.INPUT_TYPE inputType, Outputs<T> outputs)
Builder(FST.INPUT_TYPE, int, int, boolean, Outputs)
with
pruning options turned off.
public Builder(FST.INPUT_TYPE inputType, int minSuffixCount1, int minSuffixCount2, boolean doMinSuffix, Outputs<T> outputs)
Method Detail |
---|
public int getTotStateCount()
public long getTermCount()
public int getMappedStateCount()
public void add(BytesRef input, T output) throws IOException
IOException
public void add(char[] s, int offset, int length, T output) throws IOException
IOException
public void add(CharSequence s, T output) throws IOException
IOException
public void add(IntsRef input, T output) throws IOException
IOException
public FST<T> finish() throws IOException
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |