|
|||||||||||
| PREV NEXT | FRAMES NO FRAMES | ||||||||||
NutchAnalyzer plugins.item, with the supplied
priority.
Configuration.
String can be decoded in reverse and the
first character is represented by a terminal node.
String can be decoded and the last character is
represented by a terminal node.
CircularDependencyException will be thrown if a circular
dependency is detected.OnlineClusterer
extension using clustering components of the Carrot2 project
(http://carrot2.sourceforge.net).HitDetails objects) and
their previously extracted summaries (Strings).
OnlineClusterer for documentation.
true if item exists in this
FibonacciHeap, false otherwise.
Configuration for Nutch.
RegexRule.
application/octet-stream MimeType
priority value associated with
item.
Extension is a kind of listener descriptor that will be
installed on a concrete ExtensionPoint that acts as kind of
Publisher.ExtensionPoint provide meta information of a extension
point.HitSummarizer and HitContent for a set of
fetched segments.FibonacciHeap.
CrawlDatum.getScore().
analyzer implementation
given a language code.
Configuration for Nutch front-end.
Listof RSSChannels that the listener parsed from
the RSS document.
Configurable
Stringdescription of the RSS Channel.
ith field.
ith hit in this list.
robotsMeta to appropriate
values, based on any META tags found under the given
node.
Outlink from given plain text.
Outlink from given plain text and adds anchor
to the extracted Outlinks
node, and creates appropriate Outlink
records for each (relative to the supplied base
URL), and adds them to the outlinks ArrayList.
Microsoft document
extractor.
ParseImpl.
Parser instance with the specified
extId, representing its extension ID.
Parsers for a given content type.
Plugin class.
null.
Properties of the Microsoft document.
Protocol implementation for a url.
Content for a fetchlist entry.
Summarizer extension.
StringBuffer and a DOM Node,
and will append all the content text found beneath the DOM node to
the StringBuffer.
getText(sb, node, false).
StringBuffer and a DOM Node,
and will append the content text found beneath the first
title node to the StringBuffer.
ith field.
RawCluster interface to
HitsCluster interface.HtmlParseFilter implementing plugins.Searcher and HitDetailer for either a single
merged index, or a set of indexes.ObjectWritable, to permit merging different
types in reduce.IndexingFilter implementing plugins.Inlinks.false if the robots.txt file
prohibits us from accessing the given path, or
true otherwise.
true if this cluster constains documents
that did not fit anywhere else (presentation layer may
discard such clusters).
IndexingFilter that
add a lang (language) field to the document.s padded with leading spaces so
that it's length is length.
input that is matched,
or null if no match exists.
- longestMatch(String) -
Method in class org.apache.nutch.util.SuffixStringMatcher
- Returns the longest suffix of
input that is matched,
or null if no match exists.
- longestMatch(String) -
Method in class org.apache.nutch.util.TrieStringMatcher
- Returns the longest substring of
input that is
matched by a pattern in the trie, or null if no match
exists.
- lookingAhead -
Variable in class org.apache.nutch.analysis.NutchAnalysis
-
application/vnd.ms-excel).
application/vnd.ms-powerpoint).
application/msword).
java.util.HashMap.MissingDependencyException will be thrown if a plugin
dependency cannot be found.TrieStringMatcher.TrieNode visited, given that you are at
node, and the the next character in the input is
the idx'th character of s.
String is matched by a
prefix in the trie
String is matched by a
suffix in the trie
String is matched by a
pattern in the trie
Configurations that include Nutch-specific
resources.RawDocument required for Carrot2.summary and wrapping
a details hit details.
JobConf for Nutch jobs.OnlineClusterer extensions.Ontology extensions.Outlinks
/ URLs from plain text using Regular Expressions.Plugin System.http,
httpclient)Parsers to obtain
Parse objects.Protocol
implementation.Parser plugins.PluginClassLoader contains only classes of the runtime
libraries setuped in the plugin manifest file and exported libraries of
plugins that are required pluguin.PluginDescriptor provide access to all meta information of
a nutch-plugin, as well to the internationalizable resources and the plugin
own classloader.PluginManifestParser parser just parse the manifest file
in all plugin directories.PluginRuntimeException will be thrown until a exception in the
plugin managemnt occurs.Strings against a set
of prefixes.PrefixStringMatcher which will match
Strings with any prefix in the supplied array.
PrefixStringMatcher which will match
Strings with any prefix in the supplied
Collection.
ProtocolException instead.Protocol plugins.Parsers
until a successful parse is performed and a Parse object is
returned.
Content object using the Parser specified
by the parameter extId, i.e., the Parser's extension ID.
Content metadata.
FibonacciHeap.popMin() would, without
removing it.
QueryFilter implementing plugins.Java Regex implementation.URL filter based on
regular expressions.IndexingFilter that
add tag field(s) to the document."tag:" query clauses.- RelTagQueryFilter() -
Constructor for class org.apache.nutch.microformats.reltag.RelTagQueryFilter
-
- Response - interface org.apache.nutch.net.protocols.Response.
- A response inteface.
- RobotRulesParser - class org.apache.nutch.protocol.http.api.RobotRulesParser.
- This class handles the parsing of
robots.txt files. - RobotRulesParser(Configuration) -
Constructor for class org.apache.nutch.protocol.http.api.RobotRulesParser
-
- RobotRulesParser.RobotRuleSet - class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet.
- This class holds the rules which were parsed from a robots.txt
file, and can test paths against those rules.
- RobotRulesParser.RobotRuleSet() -
Constructor for class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
-
- rdfidToLabel(String) -
Method in class org.apache.nutch.ontology.jena.OwlParser
-
- read(DataInput) -
Static method in class org.apache.nutch.crawl.CrawlDatum
-
- read(DataInput) -
Static method in class org.apache.nutch.crawl.Inlink
-
- read(DataInput) -
Static method in class org.apache.nutch.parse.Outlink
-
- read(DataInput) -
Static method in class org.apache.nutch.parse.ParseData
-
- read(DataInput, Configuration) -
Static method in class org.apache.nutch.parse.ParseImpl
-
- read(DataInput) -
Static method in class org.apache.nutch.parse.ParseStatus
-
- read(DataInput) -
Static method in class org.apache.nutch.parse.ParseText
-
- read(DataInput) -
Static method in class org.apache.nutch.protocol.Content
-
- read(DataInput) -
Static method in class org.apache.nutch.protocol.ProtocolStatus
-
- read(DataInput) -
Static method in class org.apache.nutch.searcher.HitDetails
- Constructs, reads and returns an instance.
- read(DataInput, Configuration) -
Static method in class org.apache.nutch.searcher.Query.Clause
-
- read(DataInput) -
Static method in class org.apache.nutch.searcher.Query.Phrase
-
- read(DataInput) -
Static method in class org.apache.nutch.searcher.Query.Term
-
- read(DataInput, Configuration) -
Static method in class org.apache.nutch.searcher.Query
-
- read(DataInput) -
Static method in class org.apache.nutch.searcher.Summary
-
- readFields(DataInput) -
Method in class org.apache.nutch.crawl.CrawlDatum
-
- readFields(DataInput) -
Method in class org.apache.nutch.crawl.Generator.SelectorEntry
-
- readFields(DataInput) -
Method in class org.apache.nutch.crawl.Inlink
-
- readFields(DataInput) -
Method in class org.apache.nutch.crawl.Inlinks
-
- readFields(DataInput) -
Method in class org.apache.nutch.crawl.MapWritable
-
- readFields(DataInput) -
Method in class org.apache.nutch.fetcher.FetcherOutput
-
- readFields(DataInput) -
Method in class org.apache.nutch.indexer.DeleteDuplicates.HashScore
-
- readFields(DataInput) -
Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexDoc
-
- readFields(DataInput) -
Method in class org.apache.nutch.metadata.Metadata
-
- readFields(DataInput) -
Method in class org.apache.nutch.parse.Outlink
-
- readFields(DataInput) -
Method in class org.apache.nutch.parse.ParseData
-
- readFields(DataInput) -
Method in class org.apache.nutch.parse.ParseImpl
-
- readFields(DataInput) -
Method in class org.apache.nutch.parse.ParseStatus
-
- readFields(DataInput) -
Method in class org.apache.nutch.parse.ParseText
-
- readFields(DataInput) -
Method in class org.apache.nutch.protocol.ProtocolStatus
-
- readFields(DataInput) -
Method in class org.apache.nutch.searcher.Hit
-
- readFields(DataInput) -
Method in class org.apache.nutch.searcher.HitDetails
-
- readFields(DataInput) -
Method in class org.apache.nutch.searcher.Hits
-
- readFields(DataInput) -
Method in class org.apache.nutch.searcher.Query
-
- readFields(DataInput) -
Method in class org.apache.nutch.searcher.Summary
-
- readFieldsCompressed(DataInput) -
Method in class org.apache.nutch.protocol.Content
-
- readUrl(String, String, Configuration) -
Method in class org.apache.nutch.crawl.CrawlDbReader
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.CrawlDbMerger.Merger
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbDumpReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatCombiner
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.CrawlDbReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.Generator.Selector
- Collect until limit is reached.
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.Injector.InjectReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.LinkDb.Merger
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.LinkDb
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.indexer.DeleteDuplicates.HashReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.indexer.DeleteDuplicates
- Delete docs named in values from index named in key.
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.indexer.Indexer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.parse.ParseSegment
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.segment.SegmentMerger
- NOTE: in selecting the latest version we rely exclusively on the segment
name (not all segment data contain time information).
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.segment.SegmentReader
-
- regexNormalize(String) -
Method in class org.apache.nutch.net.RegexUrlNormalizer
- This function does the replacements by iterating through all the regex patterns.
- remove(Writable) -
Method in class org.apache.nutch.crawl.MapWritable
-
- remove(String) -
Method in class org.apache.nutch.metadata.Metadata
- Remove a metadata and all its associated values.
- renameFile(String, String) -
Method in class org.apache.nutch.indexer.FsDirectory
-
- renderAnonymous(PrintStream, Resource, String) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderClassDescription(PrintStream, OntClass, int) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderHierarchy(PrintStream, OntClass, List, int) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderRestriction(PrintStream, Restriction) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderURI(PrintStream, PrefixMapping, String) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- reset() -
Method in class org.apache.nutch.parse.HTMLMetaTags
- Sets all boolean values to
false.
- resolveEncodingAlias(String) -
Static method in class org.apache.nutch.util.StringUtil
-
- retrieve(String) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- retrieveFile(String, OutputStream, int) -
Method in class org.apache.nutch.protocol.ftp.Client
-
- retrieveList(String, List, int, FTPFileEntryParser) -
Method in class org.apache.nutch.protocol.ftp.Client
-
- rightPad(String, int) -
Static method in class org.apache.nutch.util.StringUtil
- Returns a copy of
s padded with trailing spaces so
that it's length is length.
- root -
Variable in class org.apache.nutch.util.TrieStringMatcher
-
- rootClasses(OntModel) -
Method in class org.apache.nutch.ontology.jena.OwlParser
-
- rootClasses(OntModel) -
Method in interface org.apache.nutch.ontology.jena.Parser
-
- run(RecordReader, OutputCollector, Reporter) -
Method in class org.apache.nutch.fetcher.Fetcher
-
- run() -
Method in class org.apache.nutch.searcher.DistributedSearch.Client
-
- run() -
Method in class org.apache.nutch.tools.PruneIndexTool
- For each query, find all matching documents and delete them from all input
indexes.
ScoringFilter implementing plugins.ObjectWritable, to permit merging different
types in reduce.ObjectWritable, to permit merging different
types in reduce.Strings against a set
of suffixes.PrefixStringMatcher which will match
Strings with any suffix in the supplied array.
PrefixStringMatcher which will match
Strings with any suffix in the supplied
Collection
Summarizer extensions.baseHref.
Configurable
noCache to true.
noFollow to true.
noIndex to true.
refresh to the supplied value.
refreshHref.
refreshTime.
Hits.totalIsExact().
input that is matched,
or null if no match exists.
- shortestMatch(String) -
Method in class org.apache.nutch.util.SuffixStringMatcher
- Returns the shortest suffix of
input that is matched,
or null if no match exists.
- shortestMatch(String) -
Method in class org.apache.nutch.util.TrieStringMatcher
- Returns the shortest substring of
input that is
matched by a pattern in the trie, or null if no match
exists.
- shutDown() -
Method in class org.apache.nutch.plugin.Plugin
- Shutdown the plugin.
- shutdown() -
Method in class org.apache.nutch.util.ThreadPool
- Turn off the pool.
- size() -
Method in class org.apache.nutch.crawl.Inlinks
-
- size() -
Method in class org.apache.nutch.crawl.MapWritable
-
- size() -
Method in class org.apache.nutch.metadata.Metadata
- Returns the number of metadata names in this metadata.
- size() -
Method in class org.apache.nutch.util.FibonacciHeap
- Returns the number of objects in the heap.
- skip(DataInput) -
Static method in class org.apache.nutch.crawl.Inlink
- Skips over one Inlink in the input.
- skip(DataInput) -
Static method in class org.apache.nutch.parse.Outlink
- Skips over one Outlink in the input.
- skippedEntity(String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Receive notification of a skipped entity.
- sort(int) -
Method in class org.apache.nutch.indexer.IndexSorter
-
- start -
Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
-
- startCDATA() -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Report the start of a CDATA section.
- startDTD(String, String, String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Report the start of DTD declarations, if any.
- startDocument() -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Receive notification of the beginning of a document.
- startElement(String, String, String, Attributes) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Receive notification of the beginning of an element.
- startEntity(String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Report the beginning of an entity.
- startPrefixMapping(String, String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Begin the scope of a prefix-URI Namespace mapping.
- startProcessing(RequestContext) -
Method in class org.apache.nutch.clustering.carrot2.LocalNutchInputComponent
- A callback hook that starts the processing.
- startUp() -
Method in class org.apache.nutch.plugin.Plugin
- Will be invoked until plugin start up.
- statNames -
Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- subclasses(String) -
Method in interface org.apache.nutch.ontology.Ontology
-
- subclasses(String) -
Method in class org.apache.nutch.ontology.jena.OntologyImpl
- retrieve all subclasses of entity(ies) hashed to searchTerm
- synonyms(String) -
Method in interface org.apache.nutch.ontology.Ontology
-
- synonyms(String) -
Method in class org.apache.nutch.ontology.jena.OntologyImpl
- retrieves synonyms from wordnet via sweet's web interface
StringUtil.toHexString(byte[], String, int), where
sep = null; lineLen = Integer.MAX_VALUE.
Hits.getTotal() gives the exact number of hits, or false if
it is only an estimate of the total number of hits.
URLFilter implementing plugins.sizeLimit bytes, if necessary.
|
|||||||||||
| PREV NEXT | FRAMES NO FRAMES | ||||||||||