Class XPathRecordReader


  • public class XPathRecordReader
    extends Object

    A streaming xpath parser which uses StAX for XML parsing. It supports only a subset of xpath syntax.

     /a/b/subject[@qualifier='fullTitle']
     /a/b/subject[@qualifier=]/subtag
     /a/b/subject/@qualifier
     //a
     //a/b...
     /a//b
     /a//b...
     /a/b/c
     
    A record is a Map<String,Object> . The key is the provided name and the value is a String or a List<String> This class is thread-safe for parsing xml. But adding fields is not thread-safe. The recommended usage is to addField() in one thread and then share the instance across threads.

    This API is experimental and may change in the future.

    Since:
    solr 1.3
    • Field Detail

      • FLATTEN

        public static final int FLATTEN
        The FLATTEN flag indicates that all text and cdata under a specific tag should be recursivly fetched and appended to the current Node's value.
        See Also:
        Constant Field Values
    • Constructor Detail

      • XPathRecordReader

        public XPathRecordReader​(String forEachXpath)
        A constructor called with a '|' separated list of Xpath expressions which define sub sections of the XML stream that are to be emitted as separate records.
        Parameters:
        forEachXpath - The XPATH for which a record is emitted. Once the xpath tag is encountered, the Node.parse method starts collecting wanted fields and at the close of the tag, a record is emitted containing all fields collected since the tag start. Once emitted the collected fields are cleared. Any fields collected in the parent tag or above will also be included in the record, but these are not cleared after emitting the record. It uses the ' | ' syntax of XPATH to pass in multiple xpaths.
    • Method Detail

      • addField

        public XPathRecordReader addField​(String name,
                                          String xpath,
                                          boolean multiValued)
        A wrapper around addField0 to create a series of Nodes based on the supplied Xpath and a given fieldName. The created nodes are inserted into a Node tree.
        Parameters:
        name - The name for this field in the emitted record
        xpath - The xpath expression for this field
        multiValued - If 'true' then the emitted record will have values in a List<String>
      • addField

        public XPathRecordReader addField​(String name,
                                          String xpath,
                                          boolean multiValued,
                                          int flags)
        A wrapper around addField0 to create a series of Nodes based on the supplied Xpath and a given fieldName. The created nodes are inserted into a Node tree.
        Parameters:
        name - The name for this field in the emitted record
        xpath - The xpath expression for this field
        multiValued - If 'true' then the emitted record will have values in a List<String>
        flags - FLATTEN: Recursively combine text from all child XML elements
      • getAllRecords

        public List<Map<String,​Object>> getAllRecords​(Reader r)
        Uses streamRecords to parse the XML source but with a handler that collects all the emitted records into a single List which is returned upon completion.
        Parameters:
        r - the stream reader
        Returns:
        results a List of emitted records
      • streamRecords

        public void streamRecords​(Reader r,
                                  XPathRecordReader.Handler handler)
        Creates an XML stream reader on top of whatever reader has been configured. Then calls parse() with a handler which is invoked forEach record emitted.
        Parameters:
        r - the stream reader
        handler - The callback instance