Post Tool

Solr includes a simple command line tool for POSTing various types of content to a Solr server.

The tool is bin/post. The bin/post tool is a Unix shell script; for Windows (non-Cygwin) usage, see the section Post Tool Windows Support below.

To run it, open a window and enter:

bin/post -c gettingstarted example/films/films.json

This will contact the server at localhost:8983. Specifying the collection/core name is mandatory. The -help (or simply -h) option will output information on its usage (i.e., bin/post -help).

Using the bin/post Tool

Specifying either the collection/core name or the full update url is mandatory when using bin/post.

The basic usage of bin/post is:

$ bin/post -h
Usage: post -c <collection> [OPTIONS] <files|directories|urls|-d ["...",...]>
    or post -help

   collection name defaults to DEFAULT_SOLR_COLLECTION if not specified

OPTIONS
=======
  Solr options:
    -url <base Solr update URL> (overrides collection, host, and port)
    -host <host> (default: localhost)
    -p or -port <port> (default: 8983)
    -commit yes|no (default: yes)
    -u or -user <user:pass> (sets BasicAuth credentials)

  Web crawl options:
    -recursive <depth> (default: 1)
    -delay <seconds> (default: 10)


  Directory crawl options:
    -delay <seconds> (default: 0)

  stdin/args options:
    -type <content/type> (default: application/xml)


  Other options:
    -filetypes <type>[,<type>,...] (default: xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)
    -params "<key>=<value>[&<key>=<value>...]" (values must be URL-encoded; these pass through to Solr update request)
    -out yes|no (default: no; yes outputs Solr response to console)
...

Examples Using bin/post

There are several ways to use bin/post. This section presents several examples.

Indexing XML

Add all documents with file extension .xml to collection or core named gettingstarted.

bin/post -c gettingstarted *.xml

Add all documents with file extension .xml to the gettingstarted collection/core on Solr running on port 8984.

bin/post -c gettingstarted -p 8984 *.xml

Send XML arguments to delete a document from gettingstarted.

bin/post -c gettingstarted -d '<delete><id>42</id></delete>'

Indexing CSV

Index all CSV files into gettingstarted:

bin/post -c gettingstarted *.csv

Index a tab-separated file into gettingstarted:

bin/post -c signals -params "separator=%09" -type text/csv data.tsv

The content type (-type) parameter is required to treat the file as the proper type, otherwise it will be ignored and a WARNING logged as it does not know what type of content a .tsv file is. The CSV handler supports the separator parameter, and is passed through using the -params setting.

Indexing JSON

Index all JSON files into gettingstarted.

bin/post -c gettingstarted *.json

Indexing Rich Documents (PDF, Word, HTML, etc.)

Index a PDF file into gettingstarted.

bin/post -c gettingstarted a.pdf

Automatically detect content types in a folder, and recursively scan it for documents for indexing into gettingstarted.

bin/post -c gettingstarted afolder/

Automatically detect content types in a folder, but limit it to PPT and HTML files and index into gettingstarted.

bin/post -c gettingstarted -filetypes ppt,html afolder/

Indexing to a Password Protected Solr (Basic Auth)

Index a PDF as the user "solr" with password "SolrRocks":

bin/post -u solr:SolrRocks -c gettingstarted a.pdf

Post Tool Windows Support

bin/post is a Unix shell script and as such cannot be used directly on Windows. However it delegates its work to a cross-platform capable Java program called "SimplePostTool" or post.jar, that can be used in Windows environments.

The argument syntax differs significantly from bin/post, so your first step should be to print the SimplePostTool help text.

$ java -jar example\exampledocs\post.jar -h

This command prints information about all the arguments and System properties available to SimplePostTool users. There are also examples showing how to post files, crawl a website or file system folder, and send update commands (deletes, etc.) directly to Solr.

Most usage involves passing both Java System properties and program arguments on the command line. Consider the example below:

$ java -jar -Dc=gettingstarted -Dauto example\exampledocs\post.jar example\exampledocs\*

This indexes the contents of the exampledocs directory into a collection called gettingstarted. The -Dauto System property governs whether or not Solr sends the document type to Solr during extraction.