org.apache.lucene.wordnet
Class Syns2Index
java.lang.Object
org.apache.lucene.wordnet.Syns2Index
public class Syns2Index
- extends Object
Convert the prolog file wn_s.pl from the WordNet prolog download
into a Lucene index suitable for looking up synonyms and performing query expansion (SynExpand.expand(...)
).
This has been tested with WordNet 2.0.
The index has fields named "word" (F_WORD
)
and "syn" (F_SYN
).
The source word (such as 'big') can be looked up in the
"word" field, and if present there will be fields named "syn"
for every synonym. What's tricky here is that there could be multiple
fields with the same name, in the general case for words that have multiple synonyms.
That's not a problem with Lucene, you just use Document.getValues(java.lang.String)
While the WordNet file distinguishes groups of synonyms with
related meanings we don't do that here.
This can take 4 minutes to execute and build an index on a "fast" system and the index takes up almost 3 MB.
- See Also:
- WordNet home page,
prologdb man page,
sample site that uses it
Method Summary |
static void |
main(String[] args)
Takes arg of prolog file name and index directory. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
F_SYN
public static final String F_SYN
- See Also:
- Constant Field Values
F_WORD
public static final String F_WORD
- See Also:
- Constant Field Values
Syns2Index
public Syns2Index()
main
public static void main(String[] args)
throws Throwable
- Takes arg of prolog file name and index directory.
- Throws:
Throwable
Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.