|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.xml.sax.helpers.DefaultHandler org.apache.solr.handler.extraction.SolrContentHandler
public class SolrContentHandler
The class responsible for handling Tika events and translating them into SolrInputDocument
s.
This class is not thread-safe.
SolrContentHandlerFactory
,
ExtractingRequestHandler
,
ExtractingDocumentLoader
Field Summary | |
---|---|
protected boolean |
captureAttribs
|
protected StringBuilder |
catchAllBuilder
|
protected String |
contentFieldName
|
protected Collection<String> |
dateFormats
|
protected String |
defaultField
|
protected SolrInputDocument |
document
|
protected Map<String,StringBuilder> |
fieldBuilders
|
protected boolean |
lowerNames
|
protected org.apache.tika.metadata.Metadata |
metadata
|
protected SolrParams |
params
|
protected IndexSchema |
schema
|
protected String |
unknownFieldPrefix
|
Fields inherited from interface org.apache.solr.handler.extraction.ExtractingParams |
---|
BOOST_PREFIX, CAPTURE_ATTRIBUTES, CAPTURE_ELEMENTS, DEFAULT_FIELD, EXTRACT_FORMAT, EXTRACT_ONLY, IGNORE_TIKA_EXCEPTION, LITERALS_OVERRIDE, LITERALS_PREFIX, LOWERNAMES, MAP_PREFIX, PASSWORD_MAP_FILE, RESOURCE_NAME, RESOURCE_PASSWORD, STREAM_TYPE, UNKNOWN_FIELD_PREFIX, XPATH_EXPRESSION |
Constructor Summary | |
---|---|
SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema)
|
|
SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
SolrParams params,
IndexSchema schema,
Collection<String> dateFormats)
|
Method Summary | |
---|---|
protected void |
addCapturedContent()
Add the per field captured content to the Solr Document. |
protected void |
addContent()
Add in the catch all content to the field. |
protected void |
addField(String fname,
String fval,
String[] vals)
|
protected void |
addLiterals()
Add in the literals to the document using the params and the ExtractingParams.LITERALS_PREFIX . |
protected void |
addMetadata()
Add in any metadata using metadata as the source. |
void |
characters(char[] chars,
int offset,
int length)
|
void |
endElement(String uri,
String localName,
String qName)
|
protected String |
findMappedName(String name)
Get the name mapping |
protected float |
getBoost(String name)
Get the value of any boost factor for the mapped name. |
SolrInputDocument |
newDocument()
This is called by a consumer when it is ready to deal with a new SolrInputDocument. |
void |
startDocument()
|
void |
startElement(String uri,
String localName,
String qName,
Attributes attributes)
|
protected String |
transformValue(String val,
SchemaField schFld)
Can be used to transform input values based on their SchemaField
This implementation only formats dates using the DateUtil . |
Methods inherited from class org.xml.sax.helpers.DefaultHandler |
---|
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected SolrInputDocument document
protected Collection<String> dateFormats
protected org.apache.tika.metadata.Metadata metadata
protected SolrParams params
protected StringBuilder catchAllBuilder
protected IndexSchema schema
protected Map<String,StringBuilder> fieldBuilders
protected boolean captureAttribs
protected boolean lowerNames
protected String contentFieldName
protected String unknownFieldPrefix
protected String defaultField
Constructor Detail |
---|
public SolrContentHandler(org.apache.tika.metadata.Metadata metadata, SolrParams params, IndexSchema schema)
public SolrContentHandler(org.apache.tika.metadata.Metadata metadata, SolrParams params, IndexSchema schema, Collection<String> dateFormats)
Method Detail |
---|
public SolrInputDocument newDocument()
SolrInputDocument
.addMetadata()
,
addCapturedContent()
,
addContent()
,
addLiterals()
protected void addCapturedContent()
fieldBuilders
info
protected void addContent()
contentFieldName
and the catchAllBuilder
protected void addLiterals()
params
and the ExtractingParams.LITERALS_PREFIX
.
protected void addMetadata()
metadata
as the source.
protected void addField(String fname, String fval, String[] vals)
public void startDocument() throws SAXException
startDocument
in interface ContentHandler
startDocument
in class DefaultHandler
SAXException
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
startElement
in interface ContentHandler
startElement
in class DefaultHandler
SAXException
public void endElement(String uri, String localName, String qName) throws SAXException
endElement
in interface ContentHandler
endElement
in class DefaultHandler
SAXException
public void characters(char[] chars, int offset, int length) throws SAXException
characters
in interface ContentHandler
characters
in class DefaultHandler
SAXException
protected String transformValue(String val, SchemaField schFld)
SchemaField
This implementation only formats dates using the DateUtil
.
val
- The value to transformschFld
- The SchemaField
protected float getBoost(String name)
name
- The name of the field to see if there is a boost specified
protected String findMappedName(String name)
name
- The name to check to see if there is a mapping
name
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |