Class AnalyzerFactoryTask

java.lang.Object
org.apache.lucene.benchmark.byTask.tasks.PerfTask
org.apache.lucene.benchmark.byTask.tasks.AnalyzerFactoryTask
All Implemented Interfaces:
Cloneable

public class AnalyzerFactoryTask extends PerfTask
Analyzer factory construction task. The name given to the constructed factory may be given to NewAnalyzerTask, which will call AnalyzerFactory.create().

Params are in the form argname:argvalue or argname:"argvalue" or argname:'argvalue'; use backslashes to escape '"' or "'" inside a quoted value when it's used as the enclosing quotation mark,

Specify params in a comma separated list of the following, in order:

  1. Analyzer args:
    • Required: name:analyzer-factory-name
    • Optional: positionIncrementGap:int value (default: 0)
    • Optional: offsetGap:int value (default: 1)
  2. zero or more CharFilterFactory's, followed by
  3. exactly one TokenizerFactory, followed by
  4. zero or more TokenFilterFactory's
Each component analysis factory may specify luceneMatchVersion (defaults to Version.LATEST) and any of the args understood by the specified *Factory class, in the above-describe param format.

Example:

     -AnalyzerFactory(name:'strip html, fold to ascii, whitespace tokenize, max 10k tokens',
                      positionIncrementGap:100,
                      HTMLStripCharFilter,
                      MappingCharFilter(mapping:'mapping-FoldToASCII.txt'),
                      WhitespaceTokenizer(luceneMatchVersion:LUCENE_5_0_0),
                      TokenLimitFilter(maxTokenCount:10000, consumeAllTokens:false))
     [...]
     -NewAnalyzer('strip html, fold to ascii, whitespace tokenize, max 10k tokens')
 

AnalyzerFactory will direct analysis component factories to look for resources under the directory specified in the "work.dir" property.

  • Constructor Details

    • AnalyzerFactoryTask

      public AnalyzerFactoryTask(PerfRunData runData)
  • Method Details

    • doLogic

      public int doLogic()
      Description copied from class: PerfTask
      Perform the task once (ignoring repetitions specification) Return number of work items done by this task. For indexing that can be number of docs added. For warming that can be number of scanned items, etc.
      Specified by:
      doLogic in class PerfTask
      Returns:
      number of work items done by this task.
    • setParams

      public void setParams(String params)
      Sets the params. Analysis component factory names may optionally include the "Factory" suffix.
      Overrides:
      setParams in class PerfTask
      Parameters:
      params - analysis pipeline specification: name, (optional) positionIncrementGap, (optional) offsetGap, 0+ CharFilterFactory's, 1 TokenizerFactory, and 0+ TokenFilterFactory's
    • lookupAnalysisClass

      public <T> Class<? extends T> lookupAnalysisClass(String className, Class<T> expectedType) throws ClassNotFoundException
      This method looks up a class with its fully qualified name (FQN), or a short-name class-simplename, or with a package suffix, assuming "org.apache.lucene.analysis." as the package prefix (e.g. "standard.ClassicTokenizerFactory" -> "org.apache.lucene.analysis.standard.ClassicTokenizerFactory").

      If className contains a period, the class is first looked up as-is, assuming that it is an FQN. If this fails, lookup is retried after prepending the Lucene analysis package prefix to the class name.

      If className does not contain a period, the analysis SPI *Factory.lookupClass() methods are used to find the class.

      Parameters:
      className - The name or the short name of the class.
      expectedType - The superclass className is expected to extend
      Returns:
      the loaded class.
      Throws:
      ClassNotFoundException - if lookup fails
    • supportsParams

      public boolean supportsParams()
      Description copied from class: PerfTask
      Sub classes that support parameters must override this method to return true.
      Overrides:
      supportsParams in class PerfTask
      Returns:
      true iff this task supports command line params.
    • lineno

      public int lineno(StreamTokenizer stok)
      Returns the current line in the algorithm file