org.apache.solr.handler.dataimport
Class XPathRecordReader

java.lang.Object
  extended by org.apache.solr.handler.dataimport.XPathRecordReader

public class XPathRecordReader
extends Object

A streaming xpath parser which uses StAX for XML parsing. It supports only a subset of xpath syntax.

 /a/b/subject[@qualifier='fullTitle']
 /a/b/subject[@qualifier=]/subtag
 /a/b/subject/@qualifier
 //a
 //a/b...
 /a//b
 /a//b...
 /a/b/c
 
A record is a Map<String,Object> . The key is the provided name and the value is a String or a List<String> This class is thread-safe for parsing xml. But adding fields is not thread-safe. The recommended usage is to addField() in one thread and then share the instance across threads.

This API is experimental and may change in the future.

Since:
solr 1.3

Nested Class Summary
static interface XPathRecordReader.Handler
          Implement this interface to stream records as and when one is found.
 
Field Summary
static int FLATTEN
          The FLATTEN flag indicates that all text and cdata under a specific tag should be recursivly fetched and appended to the current Node's value.
 
Constructor Summary
XPathRecordReader(String forEachXpath)
          A constructor called with a '|' separated list of Xpath expressions which define sub sections of the XML stream that are to be emitted as separate records.
 
Method Summary
 XPathRecordReader addField(String name, String xpath, boolean multiValued)
          A wrapper around addField0 to create a series of Nodes based on the supplied Xpath and a given fieldName.
 XPathRecordReader addField(String name, String xpath, boolean multiValued, int flags)
          A wrapper around addField0 to create a series of Nodes based on the supplied Xpath and a given fieldName.
 List<Map<String,Object>> getAllRecords(Reader r)
          Uses streamRecords to parse the XML source but with a handler that collects all the emitted records into a single List which is returned upon completion.
 void streamRecords(Reader r, XPathRecordReader.Handler handler)
          Creates an XML stream reader on top of whatever reader has been configured.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

FLATTEN

public static final int FLATTEN
The FLATTEN flag indicates that all text and cdata under a specific tag should be recursivly fetched and appended to the current Node's value.

See Also:
Constant Field Values
Constructor Detail

XPathRecordReader

public XPathRecordReader(String forEachXpath)
A constructor called with a '|' separated list of Xpath expressions which define sub sections of the XML stream that are to be emitted as separate records.

Parameters:
forEachXpath - The XPATH for which a record is emitted. Once the xpath tag is encountered, the Node.parse method starts collecting wanted fields and at the close of the tag, a record is emitted containing all fields collected since the tag start. Once emitted the collected fields are cleared. Any fields collected in the parent tag or above will also be included in the record, but these are not cleared after emitting the record. It uses the ' | ' syntax of XPATH to pass in multiple xpaths.
Method Detail

addField

public XPathRecordReader addField(String name,
                                  String xpath,
                                  boolean multiValued)
A wrapper around addField0 to create a series of Nodes based on the supplied Xpath and a given fieldName. The created nodes are inserted into a Node tree.

Parameters:
name - The name for this field in the emitted record
xpath - The xpath expression for this field
multiValued - If 'true' then the emitted record will have values in a List<String>

addField

public XPathRecordReader addField(String name,
                                  String xpath,
                                  boolean multiValued,
                                  int flags)
A wrapper around addField0 to create a series of Nodes based on the supplied Xpath and a given fieldName. The created nodes are inserted into a Node tree.

Parameters:
name - The name for this field in the emitted record
xpath - The xpath expression for this field
multiValued - If 'true' then the emitted record will have values in a List<String>
flags - FLATTEN: Recursively combine text from all child XML elements

getAllRecords

public List<Map<String,Object>> getAllRecords(Reader r)
Uses streamRecords to parse the XML source but with a handler that collects all the emitted records into a single List which is returned upon completion.

Parameters:
r - the stream reader
Returns:
results a List of emitted records

streamRecords

public void streamRecords(Reader r,
                          XPathRecordReader.Handler handler)
Creates an XML stream reader on top of whatever reader has been configured. Then calls parse() with a handler which is invoked forEach record emitted.

Parameters:
r - the stream reader
handler - The callback instance


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.