org.apache.lucene.wordnet
Class Syns2Index

java.lang.Object
  extended by org.apache.lucene.wordnet.Syns2Index

public class Syns2Index
extends Object

Convert the prolog file wn_s.pl from the WordNet prolog download into a Lucene index suitable for looking up synonyms and performing query expansion (SynExpand.expand(...)). This has been tested with WordNet 2.0. The index has fields named "word" (F_WORD) and "syn" (F_SYN).

The source word (such as 'big') can be looked up in the "word" field, and if present there will be fields named "syn" for every synonym. What's tricky here is that there could be multiple fields with the same name, in the general case for words that have multiple synonyms. That's not a problem with Lucene, you just use Document.getValues(java.lang.String)

While the WordNet file distinguishes groups of synonyms with related meanings we don't do that here.

This can take 4 minutes to execute and build an index on a "fast" system and the index takes up almost 3 MB.

See Also:
WordNet home page, prologdb man page, sample site that uses it

Field Summary
static String F_SYN
           
static String F_WORD
           
 
Constructor Summary
Syns2Index()
           
 
Method Summary
static void main(String[] args)
          Takes arg of prolog file name and index directory.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

F_SYN

public static final String F_SYN
See Also:
Constant Field Values

F_WORD

public static final String F_WORD
See Also:
Constant Field Values
Constructor Detail

Syns2Index

public Syns2Index()
Method Detail

main

public static void main(String[] args)
                 throws Throwable
Takes arg of prolog file name and index directory.

Throws:
Throwable


Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.