|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.wordnet.SynonymMap
public class SynonymMap
Loads the WordNet prolog file wn_s.pl into a thread-safe main-memory hash map that can be used for fast high-frequency lookups of synonyms for any given (lowercase) word string.
There holds: If B is a synonym for A (A -> B) then A is also a synonym for B (B -> A). There does not necessarily hold: A -> B, B -> C then A -> C.
Loading typically takes some 1.5 secs, so should be done only once per
(server) program execution, using a singleton pattern. Once loaded, a
synonym lookup via getSynonyms(String)
takes constant time O(1).
A loaded default synonym map consumes about 10 MB main memory.
An instance is immutable, hence thread-safe.
This implementation borrows some ideas from the Lucene Syns2Index demo that Dave Spencer originally contributed to Lucene. Dave's approach involved a persistent Lucene index which is suitable for occasional lookups or very large synonym tables, but considered unsuitable for high-frequency lookups of medium size synonym tables.
Example Usage:
String[] words = new String[] { "hard", "woods", "forest", "wolfish", "xxxx"}; SynonymMap map = new SynonymMap(new FileInputStream("samples/fulltext/wn_s.pl")); for (int i = 0; i < words.length; i++) { String[] synonyms = map.getSynonyms(words[i]); System.out.println(words[i] + ":" + java.util.Arrays.asList(synonyms).toString()); }Example output:
hard:[arduous, backbreaking, difficult, fermented, firmly, grueling, gruelling, heavily, heavy, intemperately, knockout, laborious, punishing, severe, severely, strong, toilsome, tough] woods:[forest, wood] forest:[afforest, timber, timberland, wood, woodland, woods] wolfish:[edacious, esurient, rapacious, ravening, ravenous, voracious, wolflike] xxxx:[]
See also:
prologdb
man page
Dave's synonym demo site
Constructor Summary | |
---|---|
SynonymMap(InputStream input)
Constructs an instance, loading WordNet synonym data from the given input stream. |
Method Summary | |
---|---|
protected String |
analyze(String word)
Analyzes/transforms the given word on input stream loading. |
String[] |
getSynonyms(String word)
Returns the synonym set for the given word, sorted ascending. |
protected boolean |
isValid(String str)
|
String |
toString()
Returns a String representation of the index data for debugging purposes. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public SynonymMap(InputStream input) throws IOException
input
- the stream to read from (null indicates an empty synonym map)
IOException
- if an error occured while reading the stream.Method Detail |
---|
public String[] getSynonyms(String word)
word
- the word to lookup (must be in lowercase).
Character.isLetter()
.public String toString()
toString
in class Object
protected String analyze(String word)
word
- the word to analyze
protected boolean isValid(String str)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |