Class BinaryDictionary
- java.lang.Object
-
- org.apache.lucene.analysis.ja.dict.BinaryDictionary
-
- All Implemented Interfaces:
Dictionary
- Direct Known Subclasses:
TokenInfoDictionary
,UnknownDictionary
public abstract class BinaryDictionary extends Object implements Dictionary
Base class for a binary-encoded in-memory dictionary.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BinaryDictionary.ResourceScheme
Used to specify where (dictionary) resources get loaded from.
-
Field Summary
Fields Modifier and Type Field Description static String
DICT_FILENAME_SUFFIX
static String
DICT_HEADER
static int
HAS_BASEFORM
flag that the entry has baseform data.static int
HAS_PRONUNCIATION
flag that the entry has pronunciation data.static int
HAS_READING
flag that the entry has reading data.static String
POSDICT_FILENAME_SUFFIX
static String
POSDICT_HEADER
static String
TARGETMAP_FILENAME_SUFFIX
static String
TARGETMAP_HEADER
static int
VERSION
-
Fields inherited from interface org.apache.lucene.analysis.ja.dict.Dictionary
INTERNAL_SEPARATOR
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
BinaryDictionary()
protected
BinaryDictionary(BinaryDictionary.ResourceScheme resourceScheme, String resourcePath)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description String
getBaseForm(int wordId, char[] surfaceForm, int off, int len)
Get base form of wordstatic InputStream
getClassResource(Class<?> clazz, String suffix)
String
getInflectionForm(int wordId)
Get inflection form of tokensString
getInflectionType(int wordId)
Get inflection type of tokensint
getLeftId(int wordId)
Get left id of specified wordString
getPartOfSpeech(int wordId)
Get Part-Of-Speech of tokensString
getPronunciation(int wordId, char[] surface, int off, int len)
Get pronunciation of tokensString
getReading(int wordId, char[] surface, int off, int len)
Get reading of tokensprotected InputStream
getResource(String suffix)
static InputStream
getResource(BinaryDictionary.ResourceScheme scheme, String path)
int
getRightId(int wordId)
Get right id of specified wordint
getWordCost(int wordId)
Get word cost of specified wordvoid
lookupWordIds(int sourceId, IntsRef ref)
-
-
-
Field Detail
-
DICT_FILENAME_SUFFIX
public static final String DICT_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
TARGETMAP_FILENAME_SUFFIX
public static final String TARGETMAP_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
POSDICT_FILENAME_SUFFIX
public static final String POSDICT_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
DICT_HEADER
public static final String DICT_HEADER
- See Also:
- Constant Field Values
-
TARGETMAP_HEADER
public static final String TARGETMAP_HEADER
- See Also:
- Constant Field Values
-
POSDICT_HEADER
public static final String POSDICT_HEADER
- See Also:
- Constant Field Values
-
VERSION
public static final int VERSION
- See Also:
- Constant Field Values
-
HAS_BASEFORM
public static final int HAS_BASEFORM
flag that the entry has baseform data. otherwise it's not inflected (same as surface form)- See Also:
- Constant Field Values
-
HAS_READING
public static final int HAS_READING
flag that the entry has reading data. otherwise reading is surface form converted to katakana- See Also:
- Constant Field Values
-
HAS_PRONUNCIATION
public static final int HAS_PRONUNCIATION
flag that the entry has pronunciation data. otherwise pronunciation is the reading- See Also:
- Constant Field Values
-
-
Constructor Detail
-
BinaryDictionary
protected BinaryDictionary() throws IOException
- Throws:
IOException
-
BinaryDictionary
protected BinaryDictionary(BinaryDictionary.ResourceScheme resourceScheme, String resourcePath) throws IOException
- Parameters:
resourceScheme
- - scheme for loading resources (FILE or CLASSPATH).resourcePath
- - where to load resources (dictionaries) from. If null, with CLASSPATH scheme only, use this class's name as the path.- Throws:
IOException
-
-
Method Detail
-
getResource
protected final InputStream getResource(String suffix) throws IOException
- Throws:
IOException
-
getResource
public static final InputStream getResource(BinaryDictionary.ResourceScheme scheme, String path) throws IOException
- Throws:
IOException
-
getClassResource
public static final InputStream getClassResource(Class<?> clazz, String suffix) throws IOException
- Throws:
IOException
-
lookupWordIds
public void lookupWordIds(int sourceId, IntsRef ref)
-
getLeftId
public int getLeftId(int wordId)
Description copied from interface:Dictionary
Get left id of specified word- Specified by:
getLeftId
in interfaceDictionary
- Returns:
- left id
-
getRightId
public int getRightId(int wordId)
Description copied from interface:Dictionary
Get right id of specified word- Specified by:
getRightId
in interfaceDictionary
- Returns:
- right id
-
getWordCost
public int getWordCost(int wordId)
Description copied from interface:Dictionary
Get word cost of specified word- Specified by:
getWordCost
in interfaceDictionary
- Returns:
- word's cost
-
getBaseForm
public String getBaseForm(int wordId, char[] surfaceForm, int off, int len)
Description copied from interface:Dictionary
Get base form of word- Specified by:
getBaseForm
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Base form (only different for inflected words, otherwise null)
-
getReading
public String getReading(int wordId, char[] surface, int off, int len)
Description copied from interface:Dictionary
Get reading of tokens- Specified by:
getReading
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Reading of the token
-
getPartOfSpeech
public String getPartOfSpeech(int wordId)
Description copied from interface:Dictionary
Get Part-Of-Speech of tokens- Specified by:
getPartOfSpeech
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Part-Of-Speech of the token
-
getPronunciation
public String getPronunciation(int wordId, char[] surface, int off, int len)
Description copied from interface:Dictionary
Get pronunciation of tokens- Specified by:
getPronunciation
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Pronunciation of the token
-
getInflectionType
public String getInflectionType(int wordId)
Description copied from interface:Dictionary
Get inflection type of tokens- Specified by:
getInflectionType
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- inflection type, or null
-
getInflectionForm
public String getInflectionForm(int wordId)
Description copied from interface:Dictionary
Get inflection form of tokens- Specified by:
getInflectionForm
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- inflection form, or null
-
-