org.apache.lucene.analysis.ja.util.DictionaryBuilder

public class DictionaryBuilder extends Object

Tool to build dictionaries. Usage:

    java -cp [lucene classpath] org.apache.lucene.analysis.ja.util.DictionaryBuilder \
          ${inputDir} ${outputDir} ${encoding} ${normalizeEntry}

The input directory is expected to include unk.def, matrix.def, plus any number of .csv files, roughly following the conventions of IPADIC. JapaneseTokenizer uses dictionaries built with this tool. Note that the input files required by this build generally must be generated from a corpus of real text using tools that are not part of Lucene.

The normalizeEntry option is a Boolean value.
If true, check a surface form (first column in csv) is NFC Normalized. If it isn't, NFC normalized contents will be added to the TokenInfoDictionary in addition to the original form.
This option is false for pre-built dictionary in the Lucene.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static enum

DictionaryBuilder.DictionaryFormat

Format of the dictionary.
Method Summary

Modifier and Type

Method

Description

static void

build(DictionaryBuilder.DictionaryFormat format, Path inputDir, Path outputDir, String encoding, boolean normalizeEntry)

static void

main(String[] args)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- build
  
  public static void build(DictionaryBuilder.DictionaryFormat format, Path inputDir, Path outputDir, String encoding, boolean normalizeEntry) throws IOException
  
  Throws:
  
  IOException
- main
  
  public static void main(String[] args) throws IOException
  
  Throws:
  
  IOException

Class DictionaryBuilder

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Details

build

main