org.apache.solr.util
Class SimplePostTool

java.lang.Object
  extended by org.apache.solr.util.SimplePostTool

public class SimplePostTool
extends Object

A simple utility class for posting raw updates to a Solr server, has a main method so it can be run on the command line. View this not as a best-practice code example, but as a standalone example built with an explicit purpose of not having external jar dependencies.


Nested Class Summary
 class SimplePostTool.PageFetcherResult
          Utility class to hold the result form a page fetch
 
Constructor Summary
SimplePostTool()
           
SimplePostTool(String mode, URL url, boolean auto, String type, int recursive, int delay, String fileTypes, OutputStream out, boolean commit, boolean optimize, String[] args)
          Constructor which takes in all mandatory input for the tool to work.
 
Method Summary
static String appendParam(String url, String param)
          Appends a URL query parameter to a URL
protected static URL appendUrlPath(URL url, String append)
          Appends to the path of the URL
 void commit()
          Does a simple commit operation
protected  String computeFullUrl(URL baseUrl, String link)
          Computes the full URL based on a base url and a possibly relative link found in the href param of an HTML anchor.
static void doGet(String url)
          Performs a simple get on the given URL
static void doGet(URL url)
          Performs a simple get on the given URL
 void execute()
          After initialization, call execute to start the post job.
 org.apache.solr.util.SimplePostTool.GlobFileFilter getFileFilterFromFileTypes(String fileTypes)
           
static NodeList getNodesFromXP(Node n, String xpath)
          Gets all nodes matching an XPath
static String getXP(Node n, String xpath, boolean concatAll)
          Gets the string content of the matching an XPath
protected static String guessType(File file)
          Guesses the type of a file, based on file name suffix
protected  byte[] inputStreamToByteArray(InputStream is)
          Reads an input stream into a byte array
protected static boolean isOn(String property)
          Tests if a string is either "true", "on", "yes" or "1"
static void main(String[] args)
          See usage() for valid command line usage
static Document makeDom(String in, String inputEncoding)
          Takes a string as input and returns a DOM
protected static String normalizeUrlEnding(String link)
          Normalizes a URL string by removing anchor part and trailing slash
 void optimize()
          Does a simple optimize operation
protected static SimplePostTool parseArgsAndInit(String[] args)
          Parses incoming arguments and system params and initializes the tool
 boolean postData(InputStream data, Integer length, OutputStream output, String type, URL url)
          Reads data from the data stream and posts it to solr, writes to the response to output
 void postFile(File file, OutputStream output, String type)
          Opens the file and posts it's contents to the solrUrl, writes to response to output.
 int postFiles(File[] files, int startIndexInArgs, OutputStream out, String type)
          Post all filenames provided in args
 int postFiles(String[] args, int startIndexInArgs, OutputStream out, String type)
          Post all filenames provided in args
 int postWebPages(String[] args, int startIndexInArgs, OutputStream out)
          This method takes as input a list of start URL strings for crawling, adds each one to a backlog and then starts crawling
static InputStream stringToStream(String s)
          Converts a string to an input stream
protected  boolean typeSupported(String type)
          Uses the mime-type map to reverse lookup whether the file ending for our type is supported by the fileTypes option
protected  int webCrawl(int level, OutputStream out)
          A very simple crawler, pulling URLs to fetch from a backlog and then recurses N levels deep if recursive>0.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimplePostTool

public SimplePostTool(String mode,
                      URL url,
                      boolean auto,
                      String type,
                      int recursive,
                      int delay,
                      String fileTypes,
                      OutputStream out,
                      boolean commit,
                      boolean optimize,
                      String[] args)
Constructor which takes in all mandatory input for the tool to work. Also see usage() for further explanation of the params.

Parameters:
mode - whether to post files, web pages, params or stdin
url - the Solr base Url to post to, should end with /update
auto - if true, we'll guess type and add resourcename/url
type - content-type of the data you are posting
recursive - number of levels for file/web mode, or 0 if one file only
delay - if recursive then delay will be the wait time between posts
fileTypes - a comma separated list of file-name endings to accept for file/web
out - an OutputStream to write output to, e.g. stdout to print to console
commit - if true, will commit at end of posting
optimize - if true, will optimize at end of posting
args - a String[] of arguments, varies between modes

SimplePostTool

public SimplePostTool()
Method Detail

main

public static void main(String[] args)
See usage() for valid command line usage

Parameters:
args - the params on the command line

execute

public void execute()
After initialization, call execute to start the post job. This method delegates to the correct mode method.


parseArgsAndInit

protected static SimplePostTool parseArgsAndInit(String[] args)
Parses incoming arguments and system params and initializes the tool

Parameters:
args - the incoming cmd line args
Returns:
an instance of SimplePostTool

postFiles

public int postFiles(String[] args,
                     int startIndexInArgs,
                     OutputStream out,
                     String type)
Post all filenames provided in args

Parameters:
args - array of file names
startIndexInArgs - offset to start
out - output stream to post data to
type - default content-type to use when posting (may be overridden in auto mode)
Returns:
number of files posted

postFiles

public int postFiles(File[] files,
                     int startIndexInArgs,
                     OutputStream out,
                     String type)
Post all filenames provided in args

Parameters:
files - array of Files
startIndexInArgs - offset to start
out - output stream to post data to
type - default content-type to use when posting (may be overridden in auto mode)
Returns:
number of files posted

postWebPages

public int postWebPages(String[] args,
                        int startIndexInArgs,
                        OutputStream out)
This method takes as input a list of start URL strings for crawling, adds each one to a backlog and then starts crawling

Parameters:
args - the raw input args from main()
startIndexInArgs - offset for where to start
out - outputStream to write results to
Returns:
the number of web pages posted

normalizeUrlEnding

protected static String normalizeUrlEnding(String link)
Normalizes a URL string by removing anchor part and trailing slash

Returns:
the normalized URL string

webCrawl

protected int webCrawl(int level,
                       OutputStream out)
A very simple crawler, pulling URLs to fetch from a backlog and then recurses N levels deep if recursive>0. Links are parsed from HTML through first getting an XHTML version using SolrCell with extractOnly, and followed if they are local. The crawler pauses for a default delay of 10 seconds between each fetch, this can be configured in the delay variable. This is only meant for test purposes, as it does not respect robots or anything else fancy :)

Parameters:
level - which level to crawl
out - output stream to write to
Returns:
number of pages crawled on this level and below

inputStreamToByteArray

protected byte[] inputStreamToByteArray(InputStream is)
                                 throws IOException
Reads an input stream into a byte array

Parameters:
is - the input stream
Returns:
the byte array
Throws:
IOException - If there is a low-level I/O error.

computeFullUrl

protected String computeFullUrl(URL baseUrl,
                                String link)
Computes the full URL based on a base url and a possibly relative link found in the href param of an HTML anchor.

Parameters:
baseUrl - the base url from where the link was found
link - the absolute or relative link
Returns:
the string version of the full URL

typeSupported

protected boolean typeSupported(String type)
Uses the mime-type map to reverse lookup whether the file ending for our type is supported by the fileTypes option

Parameters:
type - what content-type to lookup
Returns:
true if this is a supported content type

isOn

protected static boolean isOn(String property)
Tests if a string is either "true", "on", "yes" or "1"

Parameters:
property - the string to test
Returns:
true if "on"

commit

public void commit()
Does a simple commit operation


optimize

public void optimize()
Does a simple optimize operation


appendParam

public static String appendParam(String url,
                                 String param)
Appends a URL query parameter to a URL

Parameters:
url - the original URL
param - the parameter(s) to append, separated by "&"
Returns:
the string version of the resulting URL

postFile

public void postFile(File file,
                     OutputStream output,
                     String type)
Opens the file and posts it's contents to the solrUrl, writes to response to output.


appendUrlPath

protected static URL appendUrlPath(URL url,
                                   String append)
                            throws MalformedURLException
Appends to the path of the URL

Parameters:
url - the URL
append - the path to append
Returns:
the final URL version
Throws:
MalformedURLException

guessType

protected static String guessType(File file)
Guesses the type of a file, based on file name suffix

Parameters:
file - the file
Returns:
the content-type guessed

doGet

public static void doGet(String url)
Performs a simple get on the given URL


doGet

public static void doGet(URL url)
Performs a simple get on the given URL


postData

public boolean postData(InputStream data,
                        Integer length,
                        OutputStream output,
                        String type,
                        URL url)
Reads data from the data stream and posts it to solr, writes to the response to output

Returns:
true if success

stringToStream

public static InputStream stringToStream(String s)
Converts a string to an input stream

Parameters:
s - the string
Returns:
the input stream

getFileFilterFromFileTypes

public org.apache.solr.util.SimplePostTool.GlobFileFilter getFileFilterFromFileTypes(String fileTypes)

getNodesFromXP

public static NodeList getNodesFromXP(Node n,
                                      String xpath)
                               throws XPathExpressionException
Gets all nodes matching an XPath

Throws:
XPathExpressionException

getXP

public static String getXP(Node n,
                           String xpath,
                           boolean concatAll)
                    throws XPathExpressionException
Gets the string content of the matching an XPath

Parameters:
n - the node (or doc)
xpath - the xpath string
concatAll - if true, text from all matching nodes will be concatenated, else only the first returned
Throws:
XPathExpressionException

makeDom

public static Document makeDom(String in,
                               String inputEncoding)
                        throws SAXException,
                               IOException,
                               ParserConfigurationException
Takes a string as input and returns a DOM

Throws:
SAXException
IOException
ParserConfigurationException


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.