Class LineEntityProcessor
- java.lang.Object
-
- org.apache.solr.handler.dataimport.EntityProcessor
-
- org.apache.solr.handler.dataimport.EntityProcessorBase
-
- org.apache.solr.handler.dataimport.LineEntityProcessor
-
public class LineEntityProcessor extends EntityProcessorBase
An
EntityProcessor
instance which can stream lines of text read from a datasource. Options allow lines to be explicitly skipped or included in the index.Attribute summary
- url is the required location of the input file. If this value is relative, it assumed to be relative to baseLoc.
- acceptLineRegex is an optional attribute that if present discards any line which does not match the regExp.
- skipLineRegex is an optional attribute that is applied after any acceptLineRegex and discards any line which matches this regExp.
Although envisioned for reading lines from a file or url, LineEntityProcessor may also be useful for dealing with change lists, where each line contains filenames which can be used by subsequent entities to parse content from those files.
Refer to http://wiki.apache.org/solr/DataImportHandler for more details.
This API is experimental and may change in the future.
- Since:
- solr 1.4
- See Also:
Pattern
-
-
Field Summary
Fields Modifier and Type Field Description static String
ACCEPT_LINE_REGEX
Holds the name of entity attribute that will be parsed to obtain the pattern to be used when checking to see if a line should be returned.static String
SKIP_LINE_REGEX
Holds the name of entity attribute that will be parsed to obtain the pattern to be used when checking to see if a line should be ignored.static String
URL
Holds the name of entity attribute that will be parsed to obtain the filename containing the changelist.-
Fields inherited from class org.apache.solr.handler.dataimport.EntityProcessorBase
ABORT, cacheSupport, context, CONTINUE, entityName, isFirstInit, ON_ERROR, onError, query, rowIterator, SKIP, TRANSFORM_ROW, TRANSFORMER
-
-
Constructor Summary
Constructors Constructor Description LineEntityProcessor()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
closeResources()
void
destroy()
Invoked for each entity at the very end of the import to do any needed cleanup tasks.void
init(Context context)
Parses each of the entity attributes.Map<String,Object>
nextRow()
Reads lines from the url till it finds a lines that matches the optional acceptLineRegex and does not match the optional skipLineRegex.-
Methods inherited from class org.apache.solr.handler.dataimport.EntityProcessorBase
firstInit, getNext, initCache, nextDeletedRowKey, nextModifiedParentRowKey, nextModifiedRowKey
-
Methods inherited from class org.apache.solr.handler.dataimport.EntityProcessor
close, postTransform
-
-
-
-
Field Detail
-
URL
public static final String URL
Holds the name of entity attribute that will be parsed to obtain the filename containing the changelist.- See Also:
- Constant Field Values
-
ACCEPT_LINE_REGEX
public static final String ACCEPT_LINE_REGEX
Holds the name of entity attribute that will be parsed to obtain the pattern to be used when checking to see if a line should be returned.- See Also:
- Constant Field Values
-
SKIP_LINE_REGEX
public static final String SKIP_LINE_REGEX
Holds the name of entity attribute that will be parsed to obtain the pattern to be used when checking to see if a line should be ignored.- See Also:
- Constant Field Values
-
-
Method Detail
-
init
public void init(Context context)
Parses each of the entity attributes.- Overrides:
init
in classEntityProcessorBase
- Parameters:
context
- The current context
-
nextRow
public Map<String,Object> nextRow()
Reads lines from the url till it finds a lines that matches the optional acceptLineRegex and does not match the optional skipLineRegex.- Overrides:
nextRow
in classEntityProcessorBase
- Returns:
- A row containing a minimum of one field "rawLine" or null to signal end of file. The rawLine is the as line as returned by readLine() from the url. However transformers can be used to create as many other fields as required.
-
closeResources
public void closeResources()
-
destroy
public void destroy()
Description copied from class:EntityProcessor
Invoked for each entity at the very end of the import to do any needed cleanup tasks.- Overrides:
destroy
in classEntityProcessorBase
-
-