org.apache.lucene.util.fst
Class Util

java.lang.Object
  extended by org.apache.lucene.util.fst.Util

public final class Util
extends Object

Static helper methods.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
static class Util.MinResult<T>
          Holds a single input (IntsRef) + output, returned by shortestPaths().
static class Util.TopNSearcher<T>
          Utility class to find top N shortest paths from start point(s).
 
Method Summary
static
<T> T
get(FST<T> fst, BytesRef input)
          Looks up the output for this input, or null if the input is not accepted
static
<T> T
get(FST<T> fst, IntsRef input)
          Looks up the output for this input, or null if the input is not accepted.
static IntsRef getByOutput(FST<Long> fst, long targetOutput)
          Reverse lookup (lookup by output instead of by input), in the special case when your FSTs outputs are strictly ascending.
static IntsRef getByOutput(FST<Long> fst, long targetOutput, FST.BytesReader in, FST.Arc<Long> arc, FST.Arc<Long> scratchArc, IntsRef result)
          Expert: like getByOutput(FST, long) except reusing BytesReader, initial and scratch Arc, and result.
static
<T> FST.Arc<T>
readCeilArc(int label, FST<T> fst, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
          Reads the first arc greater or equal that the given label into the provided arc in place and returns it iff found, otherwise return null.
static
<T> Util.MinResult<T>[]
shortestPaths(FST<T> fst, FST.Arc<T> fromNode, T startOutput, Comparator<T> comparator, int topN, boolean allowEmptyString)
          Starting from node, find the top N min cost completions to a final node.
static BytesRef toBytesRef(IntsRef input, BytesRef scratch)
          Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.
static
<T> void
toDot(FST<T> fst, Writer out, boolean sameRank, boolean labelStates)
          Dumps an FST to a GraphViz's dot language description for visualization.
static IntsRef toIntsRef(BytesRef input, IntsRef scratch)
          Just takes unsigned byte values from the BytesRef and converts into an IntsRef.
static IntsRef toUTF16(CharSequence s, IntsRef scratch)
          Just maps each UTF16 unit (char) to the ints in an IntsRef.
static IntsRef toUTF32(char[] s, int offset, int length, IntsRef scratch)
          Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.
static IntsRef toUTF32(CharSequence s, IntsRef scratch)
          Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

get

public static <T> T get(FST<T> fst,
                        IntsRef input)
             throws IOException
Looks up the output for this input, or null if the input is not accepted.

Throws:
IOException

get

public static <T> T get(FST<T> fst,
                        BytesRef input)
             throws IOException
Looks up the output for this input, or null if the input is not accepted

Throws:
IOException

getByOutput

public static IntsRef getByOutput(FST<Long> fst,
                                  long targetOutput)
                           throws IOException
Reverse lookup (lookup by output instead of by input), in the special case when your FSTs outputs are strictly ascending. This locates the input/output pair where the output is equal to the target, and will return null if that output does not exist.

NOTE: this only works with FST<Long>, only works when the outputs are ascending in order with the inputs and only works when you shared the outputs (pass doShare=true to PositiveIntOutputs.getSingleton()). For example, simple ordinals (0, 1, 2, ...), or file offets (when appending to a file) fit this.

Throws:
IOException

getByOutput

public static IntsRef getByOutput(FST<Long> fst,
                                  long targetOutput,
                                  FST.BytesReader in,
                                  FST.Arc<Long> arc,
                                  FST.Arc<Long> scratchArc,
                                  IntsRef result)
                           throws IOException
Expert: like getByOutput(FST, long) except reusing BytesReader, initial and scratch Arc, and result.

Throws:
IOException

shortestPaths

public static <T> Util.MinResult<T>[] shortestPaths(FST<T> fst,
                                                    FST.Arc<T> fromNode,
                                                    T startOutput,
                                                    Comparator<T> comparator,
                                                    int topN,
                                                    boolean allowEmptyString)
                                         throws IOException
Starting from node, find the top N min cost completions to a final node.

NOTE: you must share the outputs when you build the FST (pass doShare=true to PositiveIntOutputs.getSingleton()).

Throws:
IOException

toDot

public static <T> void toDot(FST<T> fst,
                             Writer out,
                             boolean sameRank,
                             boolean labelStates)
                  throws IOException
Dumps an FST to a GraphViz's dot language description for visualization. Example of use:
 PrintWriter pw = new PrintWriter("out.dot");
 Util.toDot(fst, pw, true, true);
 pw.close();
 
and then, from command line:
 dot -Tpng -o out.png out.dot
 

Note: larger FSTs (a few thousand nodes) won't even render, don't bother. If the FST is > 2.1 GB in size then this method will throw strange exceptions.

Parameters:
sameRank - If true, the resulting dot file will try to order states in layers of breadth-first traversal. This may mess up arcs, but makes the output FST's structure a bit clearer.
labelStates - If true states will have labels equal to their offsets in their binary format. Expands the graph considerably.
Throws:
IOException
See Also:
"http://www.graphviz.org/"

toUTF16

public static IntsRef toUTF16(CharSequence s,
                              IntsRef scratch)
Just maps each UTF16 unit (char) to the ints in an IntsRef.


toUTF32

public static IntsRef toUTF32(CharSequence s,
                              IntsRef scratch)
Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.


toUTF32

public static IntsRef toUTF32(char[] s,
                              int offset,
                              int length,
                              IntsRef scratch)
Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.


toIntsRef

public static IntsRef toIntsRef(BytesRef input,
                                IntsRef scratch)
Just takes unsigned byte values from the BytesRef and converts into an IntsRef.


toBytesRef

public static BytesRef toBytesRef(IntsRef input,
                                  BytesRef scratch)
Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.


readCeilArc

public static <T> FST.Arc<T> readCeilArc(int label,
                                         FST<T> fst,
                                         FST.Arc<T> follow,
                                         FST.Arc<T> arc,
                                         FST.BytesReader in)
                              throws IOException
Reads the first arc greater or equal that the given label into the provided arc in place and returns it iff found, otherwise return null.

Parameters:
label - the label to ceil on
fst - the fst to operate on
follow - the arc to follow reading the label from
arc - the arc to read into in place
in - the fst's FST.BytesReader
Throws:
IOException


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.