Class XPathEntityProcessor
- java.lang.Object
-
- org.apache.solr.handler.dataimport.EntityProcessor
-
- org.apache.solr.handler.dataimport.EntityProcessorBase
-
- org.apache.solr.handler.dataimport.XPathEntityProcessor
-
public class XPathEntityProcessor extends EntityProcessorBase
An implementation of
EntityProcessor
which uses a streaming xpath parser to extract values out of XML documents. It is typically used in conjunction withURLDataSource
orFileDataSource
.Refer to http://wiki.apache.org/solr/DataImportHandler for more details.
This API is experimental and may change in the future.
- Since:
- solr 1.3
- See Also:
XPathRecordReader
-
-
Field Summary
Fields Modifier and Type Field Description protected int
blockingQueueSize
protected int
blockingQueueTimeOut
protected TimeUnit
blockingQueueTimeOutUnits
static String
COMMON_FIELD
protected List<String>
commonFields
protected DataSource<Reader>
dataSource
static String
FOR_EACH
static String
HAS_MORE
static String
NEXT_URL
protected List<String>
placeHolderVariables
protected Thread
publisherThread
protected boolean
reinitXPathReader
static String
STREAM
protected boolean
streamRows
static String
URL
static String
USE_SOLR_ADD_SCHEMA
protected boolean
useSolrAddXml
static String
XPATH
static String
XPATH_FIELD_NAME
static String
XSL
protected Transformer
xslTransformer
-
Fields inherited from class org.apache.solr.handler.dataimport.EntityProcessorBase
ABORT, cacheSupport, context, CONTINUE, entityName, isFirstInit, ON_ERROR, onError, query, rowIterator, SKIP, TRANSFORM_ROW, TRANSFORMER
-
-
Constructor Summary
Constructors Constructor Description XPathEntityProcessor()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
init(Context context)
This method is called when it starts processing an entity.Map<String,Object>
nextRow()
For a simple implementation, this is the only method that the sub-class should implement.void
postTransform(Map<String,Object> r)
Invoked after the transformers are invoked.protected Map<String,Object>
readRow(Map<String,Object> record, String xpath)
-
Methods inherited from class org.apache.solr.handler.dataimport.EntityProcessorBase
destroy, firstInit, getNext, initCache, nextDeletedRowKey, nextModifiedParentRowKey, nextModifiedRowKey
-
Methods inherited from class org.apache.solr.handler.dataimport.EntityProcessor
close
-
-
-
-
Field Detail
-
dataSource
protected DataSource<Reader> dataSource
-
xslTransformer
protected Transformer xslTransformer
-
useSolrAddXml
protected boolean useSolrAddXml
-
streamRows
protected boolean streamRows
-
blockingQueueTimeOut
protected int blockingQueueTimeOut
-
blockingQueueTimeOutUnits
protected TimeUnit blockingQueueTimeOutUnits
-
blockingQueueSize
protected int blockingQueueSize
-
publisherThread
protected Thread publisherThread
-
reinitXPathReader
protected boolean reinitXPathReader
-
URL
public static final String URL
- See Also:
- Constant Field Values
-
HAS_MORE
public static final String HAS_MORE
- See Also:
- Constant Field Values
-
NEXT_URL
public static final String NEXT_URL
- See Also:
- Constant Field Values
-
XPATH_FIELD_NAME
public static final String XPATH_FIELD_NAME
- See Also:
- Constant Field Values
-
FOR_EACH
public static final String FOR_EACH
- See Also:
- Constant Field Values
-
XPATH
public static final String XPATH
- See Also:
- Constant Field Values
-
COMMON_FIELD
public static final String COMMON_FIELD
- See Also:
- Constant Field Values
-
USE_SOLR_ADD_SCHEMA
public static final String USE_SOLR_ADD_SCHEMA
- See Also:
- Constant Field Values
-
XSL
public static final String XSL
- See Also:
- Constant Field Values
-
STREAM
public static final String STREAM
- See Also:
- Constant Field Values
-
-
Method Detail
-
init
public void init(Context context)
Description copied from class:EntityProcessor
This method is called when it starts processing an entity. When it comes back to the entity it is called again. So it can reset anything at that point. For a rootmost entity this is called only once for an ingestion. For sub-entities , this is called multiple once for each row from its parent entity- Overrides:
init
in classEntityProcessorBase
- Parameters:
context
- The current context
-
nextRow
public Map<String,Object> nextRow()
Description copied from class:EntityProcessorBase
For a simple implementation, this is the only method that the sub-class should implement. This is intended to stream rows one-by-one. Return null to signal end of rows- Overrides:
nextRow
in classEntityProcessorBase
- Returns:
- a row where the key is the name of the field and value can be any Object or a Collection of objects. Return null to signal end of rows
-
postTransform
public void postTransform(Map<String,Object> r)
Description copied from class:EntityProcessor
Invoked after the transformers are invoked. EntityProcessors can add, remove or modify values added by Transformers in this method.- Overrides:
postTransform
in classEntityProcessor
- Parameters:
r
- The transformed row
-
-