Class UserDictionary

java.lang.Object
org.apache.lucene.analysis.ja.dict.UserDictionary
All Implemented Interfaces:
Dictionary

public final class UserDictionary extends Object implements Dictionary
Class for building a User Dictionary. This class allows for custom segmentation of phrases.
  • Field Details

  • Method Details

    • open

      public static UserDictionary open(Reader reader) throws IOException
      Throws:
      IOException
    • lookup

      public int[][] lookup(char[] chars, int off, int len) throws IOException
      Lookup words in text
      Parameters:
      chars - text
      off - offset into text
      len - length of text
      Returns:
      array of {wordId, position, length}
      Throws:
      IOException
    • getFST

      public TokenInfoFST getFST()
    • lookupSegmentation

      public int[] lookupSegmentation(int phraseID)
    • getLeftId

      public int getLeftId(int wordId)
      Description copied from interface: Dictionary
      Get left id of specified word
      Specified by:
      getLeftId in interface Dictionary
      Returns:
      left id
    • getRightId

      public int getRightId(int wordId)
      Description copied from interface: Dictionary
      Get right id of specified word
      Specified by:
      getRightId in interface Dictionary
      Returns:
      right id
    • getWordCost

      public int getWordCost(int wordId)
      Description copied from interface: Dictionary
      Get word cost of specified word
      Specified by:
      getWordCost in interface Dictionary
      Returns:
      word's cost
    • getReading

      public String getReading(int wordId, char[] surface, int off, int len)
      Description copied from interface: Dictionary
      Get reading of tokens
      Specified by:
      getReading in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      Reading of the token
    • getPartOfSpeech

      public String getPartOfSpeech(int wordId)
      Description copied from interface: Dictionary
      Get Part-Of-Speech of tokens
      Specified by:
      getPartOfSpeech in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      Part-Of-Speech of the token
    • getBaseForm

      public String getBaseForm(int wordId, char[] surface, int off, int len)
      Description copied from interface: Dictionary
      Get base form of word
      Specified by:
      getBaseForm in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      Base form (only different for inflected words, otherwise null)
    • getPronunciation

      public String getPronunciation(int wordId, char[] surface, int off, int len)
      Description copied from interface: Dictionary
      Get pronunciation of tokens
      Specified by:
      getPronunciation in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      Pronunciation of the token
    • getInflectionType

      public String getInflectionType(int wordId)
      Description copied from interface: Dictionary
      Get inflection type of tokens
      Specified by:
      getInflectionType in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      inflection type, or null
    • getInflectionForm

      public String getInflectionForm(int wordId)
      Description copied from interface: Dictionary
      Get inflection form of tokens
      Specified by:
      getInflectionForm in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      inflection form, or null