Package org.apache.lucene.benchmark.byTask.feeds

Sources for benchmark inputs: documents and queries.


Interface Summary
HTMLParser HTML Parsing Interface for test purposes
QueryMaker Create queries for the test.

Class Summary
AbstractQueryMaker Abstract base query maker.
ContentSource Represents content from a specified source, such as TREC, Reuters etc.
DemoHTMLParser HTML Parser that is based on Lucene's demo HTML parser.
DirContentSource A ContentSource using the Dir collection for its input.
DocData Output of parsing (e.g.
DocMaker Creates Document objects.
EnwikiContentSource A ContentSource which reads the English Wikipedia dump.
EnwikiQueryMaker A QueryMaker that uses common and uncommon actual Wikipedia queries for searching the English Wikipedia collection.
FileBasedQueryMaker Create queries from a FileReader.
LineDocSource A ContentSource reading one line at a time as a Document from a single file.
LineDocSource.HeaderLineParser LineDocSource.LineParser which sets field names and order by the header - any header - of the lines file.
LineDocSource.LineParser Reader of a single input line into DocData.
LineDocSource.SimpleLineParser LineDocSource.LineParser which ignores the header passed to its constructor and assumes simply that field names and their order are the same as in WriteLineDocTask.DEFAULT_FIELDS
LongToEnglishContentSource Creates documents whose content is a long number starting from Long.MIN_VALUE + 10.
ReutersContentSource A ContentSource reading from the Reuters collection.
ReutersQueryMaker A QueryMaker that makes queries devised manually (by Grant Ingersoll) for searching in the Reuters collection.
SimpleQueryMaker A QueryMaker that makes queries for a collection created using SingleDocSource.
SimpleSloppyPhraseQueryMaker Create sloppy phrase queries for performance test, in an index created using simple doc maker.
SingleDocSource Creates the same document each time SingleDocSource.getNextDocData(DocData) is called.
SortableSingleDocSource Adds fields appropriate for sorting: country, random_string and sort_field (int).
TrecContentSource Implements a ContentSource over the TREC collection.
TrecDocParser Parser for trec doc content, invoked on doc text excluding and which are handled in TrecContentSource.
TrecFBISParser Parser for the FBIS docs in trec disks 4+5 collection format
TrecFR94Parser Parser for the FR94 docs in trec disks 4+5 collection format
TrecFTParser Parser for the FT docs in trec disks 4+5 collection format
TrecGov2Parser Parser for the GOV2 collection format
TrecLATimesParser Parser for the FT docs in trec disks 4+5 collection format
TrecParserByPath Parser for trec docs which selects the parser to apply according to the source files path, defaulting to TrecGov2Parser.

Enum Summary
TrecDocParser.ParsePathType Types of trec parse paths,

Exception Summary
NoMoreDataException Exception indicating there is no more data.

Package org.apache.lucene.benchmark.byTask.feeds Description

Sources for benchmark inputs: documents and queries.

Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.