Package org.apache.lucene.benchmark.byTask.feeds
package org.apache.lucene.benchmark.byTask.feeds
Sources for benchmark inputs: documents and queries.
-
ClassDescriptionAbstract base query maker.Base class for source of data for benchmarkingRepresents content from a specified source, such as TREC, Reuters etc.Simple HTML Parser extracting title, meta tags, and body text that is based on NekoHTML.The actual parser to read HTML documentsA
ContentSource
using the Dir collection for its input.Iterator over the files in the directoryOutput of parsing (e.g.CreatesDocument
objects.Document state, supports reuse of field instances across documents (seereuseFields
parameter).AContentSource
which reads the English Wikipedia dump.A QueryMaker that uses common and uncommon actual Wikipedia queries for searching the English Wikipedia collection.Source items for facets.Create queries from a FileReader.A line parser for Geonames.org data.HTML Parsing Interface for test purposesAContentSource
reading one line at a time as aDocument
from a single file.LineDocSource.LineParser
which sets field names and order by the header - any header - of the lines file.Reader of a single input line intoDocData
.LineDocSource.LineParser
which ignores the header passed to its constructor and assumes simply that field names and their order are the same as inDEFAULT_FIELDS
Creates documents whose content is along
number starting fromLong.MIN_VALUE
+ 10Creates queries whose content is a spelled-outlong
number starting fromLong.MIN_VALUE
+ 10Exception indicating there is no more data.Create queries for the test.Simple implementation of a random facet sourceAContentSource
reading from the Reuters collection.A QueryMaker that makes queries devised manually (by Grant Ingersoll) for searching in the Reuters collection.A QueryMaker that makes queries for a collection created usingSingleDocSource
.Create sloppy phrase queries for performance test, in an index created using simple doc maker.Creates the same document each timeSingleDocSource.getNextDocData(DocData)
is called.Adds fields appropriate for sorting: country, random_string and sort_field (int).Indexes spatial data according to a configuredSpatialStrategy
with optional shape transformation via a configuredSpatialDocMaker.ShapeConverter
.Converts one shape to another.Reads spatial data from the body field docs from an internally createdLineDocSource
.Implements aContentSource
over the TREC collection.Parser for trec doc content, invoked on doc text excluding <DOC> and <DOCNO> which are handled in TrecContentSource.Types of trec parse paths,Parser for the FBIS docs in trec disks 4+5 collection formatParser for the FR94 docs in trec disks 4+5 collection formatParser for the FT docs in trec disks 4+5 collection formatParser for the GOV2 collection formatParser for the FT docs in trec disks 4+5 collection formatParser for trec docs which selects the parser to apply according to the source files path, defaulting toTrecGov2Parser
.