Class SynonymMap

  extended by org.apache.lucene.wordnet.SynonymMap

public class SynonymMap
extends Object

Loads the WordNet prolog file into a thread-safe main-memory hash map that can be used for fast high-frequency lookups of synonyms for any given (lowercase) word string.

There holds: If B is a synonym for A (A -> B) then A is also a synonym for B (B -> A). There does not necessarily hold: A -> B, B -> C then A -> C.

Loading typically takes some 1.5 secs, so should be done only once per (server) program execution, using a singleton pattern. Once loaded, a synonym lookup via getSynonyms(String)takes constant time O(1). A loaded default synonym map consumes about 10 MB main memory. An instance is immutable, hence thread-safe.

This implementation borrows some ideas from the Lucene Syns2Index demo that Dave Spencer originally contributed to Lucene. Dave's approach involved a persistent Lucene index which is suitable for occasional lookups or very large synonym tables, but considered unsuitable for high-frequency lookups of medium size synonym tables.

Example Usage:

 String[] words = new String[] { "hard", "woods", "forest", "wolfish", "xxxx"};
 SynonymMap map = new SynonymMap(new FileInputStream("samples/fulltext/"));
 for (int i = 0; i < words.length; i++) {
     String[] synonyms = map.getSynonyms(words[i]);
     System.out.println(words[i] + ":" + java.util.Arrays.asList(synonyms).toString());
Example output:
 hard:[arduous, backbreaking, difficult, fermented, firmly, grueling, gruelling, heavily, heavy, intemperately, knockout, laborious, punishing, severe, severely, strong, toilsome, tough]
 woods:[forest, wood]
 forest:[afforest, timber, timberland, wood, woodland, woods]
 wolfish:[edacious, esurient, rapacious, ravening, ravenous, voracious, wolflike]

See also:
prologdb man page
Dave's synonym demo site

Constructor Summary
SynonymMap(InputStream input)
          Constructs an instance, loading WordNet synonym data from the given input stream.
Method Summary
protected  String analyze(String word)
          Analyzes/transforms the given word on input stream loading.
 String[] getSynonyms(String word)
          Returns the synonym set for the given word, sorted ascending.
protected  boolean isValid(String str)
 String toString()
          Returns a String representation of the index data for debugging purposes.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Detail


public SynonymMap(InputStream input)
           throws IOException
Constructs an instance, loading WordNet synonym data from the given input stream. Finally closes the stream. The words in the stream must be in UTF-8 or a compatible subset (for example ASCII, MacRoman, etc.).

input - the stream to read from (null indicates an empty synonym map)
IOException - if an error occured while reading the stream.
Method Detail


public String[] getSynonyms(String word)
Returns the synonym set for the given word, sorted ascending.

word - the word to lookup (must be in lowercase).
the synonyms; a set of zero or more words, sorted ascending, each word containing lowercase characters that satisfy Character.isLetter().


public String toString()
Returns a String representation of the index data for debugging purposes.

toString in class Object
a String representation


protected String analyze(String word)
Analyzes/transforms the given word on input stream loading. This default implementation simply lowercases the word. Override this method with a custom stemming algorithm or similar, if desired.

word - the word to analyze
the same word, or a different word (or null to indicate that the word should be ignored)


protected boolean isValid(String str)

Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.