public class SolrContentHandler extends DefaultHandler implements ExtractingParams
SolrInputDocument
s.
This class is not thread-safe.
User's may wish to override this class to provide their own functionality.Modifier and Type | Field and Description |
---|---|
protected boolean |
captureAttribs |
protected StringBuilder |
catchAllBuilder |
protected String |
contentFieldName |
protected Collection<String> |
dateFormats |
protected String |
defaultField |
protected SolrInputDocument |
document |
protected Map<String,StringBuilder> |
fieldBuilders |
protected boolean |
lowerNames |
protected org.apache.tika.metadata.Metadata |
metadata |
protected SolrParams |
params |
protected IndexSchema |
schema |
protected String |
unknownFieldPrefix |
BOOST_PREFIX, CAPTURE_ATTRIBUTES, CAPTURE_ELEMENTS, DEFAULT_FIELD, EXTRACT_FORMAT, EXTRACT_ONLY, IGNORE_TIKA_EXCEPTION, LITERALS_OVERRIDE, LITERALS_PREFIX, LOWERNAMES, MAP_PREFIX, PASSWORD_MAP_FILE, RESOURCE_NAME, RESOURCE_PASSWORD, STREAM_TYPE, UNKNOWN_FIELD_PREFIX, XPATH_EXPRESSION
Constructor and Description |
---|
SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema) |
SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema,
Collection<String> dateFormats) |
Modifier and Type | Method and Description |
---|---|
protected void |
addCapturedContent()
Add the per field captured content to the Solr Document.
|
protected void |
addContent()
Add in the catch all content to the field.
|
protected void |
addField(String fname,
String fval,
String[] vals) |
protected void |
addLiterals()
Add in the literals to the document using the
params and the ExtractingParams.LITERALS_PREFIX . |
protected void |
addMetadata()
Add in any metadata using
metadata as the source. |
void |
characters(char[] chars,
int offset,
int length) |
void |
endElement(String uri,
String localName,
String qName) |
protected String |
findMappedName(String name)
Get the name mapping
|
protected float |
getBoost(String name)
Get the value of any boost factor for the mapped name.
|
void |
ignorableWhitespace(char[] chars,
int offset,
int length)
Treat the same as any other characters
|
SolrInputDocument |
newDocument()
This is called by a consumer when it is ready to deal with a new SolrInputDocument.
|
void |
startDocument() |
void |
startElement(String uri,
String localName,
String qName,
Attributes attributes) |
protected String |
transformValue(String val,
SchemaField schFld)
Can be used to transform input values based on their
SchemaField
This implementation only formats dates using the DateUtil . |
endDocument, endPrefixMapping, error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning
protected SolrInputDocument document
protected Collection<String> dateFormats
protected org.apache.tika.metadata.Metadata metadata
protected SolrParams params
protected StringBuilder catchAllBuilder
protected IndexSchema schema
protected Map<String,StringBuilder> fieldBuilders
protected boolean captureAttribs
protected boolean lowerNames
protected String contentFieldName
protected String unknownFieldPrefix
protected String defaultField
public SolrContentHandler(org.apache.tika.metadata.Metadata metadata, SolrParams params, IndexSchema schema)
public SolrContentHandler(org.apache.tika.metadata.Metadata metadata, SolrParams params, IndexSchema schema, Collection<String> dateFormats)
public SolrInputDocument newDocument()
SolrInputDocument
.addMetadata()
,
addCapturedContent()
,
addContent()
,
addLiterals()
protected void addCapturedContent()
fieldBuilders
infoprotected void addContent()
contentFieldName
and the catchAllBuilder
protected void addLiterals()
params
and the ExtractingParams.LITERALS_PREFIX
.protected void addMetadata()
metadata
as the source.public void startDocument() throws SAXException
startDocument
in interface ContentHandler
startDocument
in class DefaultHandler
SAXException
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
startElement
in interface ContentHandler
startElement
in class DefaultHandler
SAXException
public void endElement(String uri, String localName, String qName) throws SAXException
endElement
in interface ContentHandler
endElement
in class DefaultHandler
SAXException
public void characters(char[] chars, int offset, int length) throws SAXException
characters
in interface ContentHandler
characters
in class DefaultHandler
SAXException
public void ignorableWhitespace(char[] chars, int offset, int length) throws SAXException
ignorableWhitespace
in interface ContentHandler
ignorableWhitespace
in class DefaultHandler
SAXException
protected String transformValue(String val, SchemaField schFld)
SchemaField
This implementation only formats dates using the DateUtil
.val
- The value to transformschFld
- The SchemaField
protected float getBoost(String name)
name
- The name of the field to see if there is a boost specifiedCopyright © 2000-2015 Apache Software Foundation. All Rights Reserved.