Class AnalyzerFactoryTask

  • All Implemented Interfaces:
    Cloneable

    public class AnalyzerFactoryTask
    extends PerfTask
    Analyzer factory construction task. The name given to the constructed factory may be given to NewAnalyzerTask, which will call AnalyzerFactory.create().

    Params are in the form argname:argvalue or argname:"argvalue" or argname:'argvalue'; use backslashes to escape '"' or "'" inside a quoted value when it's used as the enclosing quotation mark,

    Specify params in a comma separated list of the following, in order:

    1. Analyzer args:
      • Required: name:analyzer-factory-name
      • Optional: positionIncrementGap:int value (default: 0)
      • Optional: offsetGap:int value (default: 1)
    2. zero or more CharFilterFactory's, followed by
    3. exactly one TokenizerFactory, followed by
    4. zero or more TokenFilterFactory's
    Each component analysis factory may specify luceneMatchVersion (defaults to Version.LATEST) and any of the args understood by the specified *Factory class, in the above-describe param format.

    Example:

         -AnalyzerFactory(name:'strip html, fold to ascii, whitespace tokenize, max 10k tokens',
                          positionIncrementGap:100,
                          HTMLStripCharFilter,
                          MappingCharFilter(mapping:'mapping-FoldToASCII.txt'),
                          WhitespaceTokenizer(luceneMatchVersion:LUCENE_5_0_0),
                          TokenLimitFilter(maxTokenCount:10000, consumeAllTokens:false))
         [...]
         -NewAnalyzer('strip html, fold to ascii, whitespace tokenize, max 10k tokens')
     

    AnalyzerFactory will direct analysis component factories to look for resources under the directory specified in the "work.dir" property.

    • Constructor Detail

      • AnalyzerFactoryTask

        public AnalyzerFactoryTask​(PerfRunData runData)
    • Method Detail

      • doLogic

        public int doLogic()
        Description copied from class: PerfTask
        Perform the task once (ignoring repetitions specification) Return number of work items done by this task. For indexing that can be number of docs added. For warming that can be number of scanned items, etc.
        Specified by:
        doLogic in class PerfTask
        Returns:
        number of work items done by this task.
      • setParams

        public void setParams​(String params)
        Sets the params. Analysis component factory names may optionally include the "Factory" suffix.
        Overrides:
        setParams in class PerfTask
        Parameters:
        params - analysis pipeline specification: name, (optional) positionIncrementGap, (optional) offsetGap, 0+ CharFilterFactory's, 1 TokenizerFactory, and 0+ TokenFilterFactory's
      • lookupAnalysisClass

        public <T> Class<? extends T> lookupAnalysisClass​(String className,
                                                          Class<T> expectedType)
                                                   throws ClassNotFoundException
        This method looks up a class with its fully qualified name (FQN), or a short-name class-simplename, or with a package suffix, assuming "org.apache.lucene.analysis." as the package prefix (e.g. "standard.ClassicTokenizerFactory" -> "org.apache.lucene.analysis.standard.ClassicTokenizerFactory").

        If className contains a period, the class is first looked up as-is, assuming that it is an FQN. If this fails, lookup is retried after prepending the Lucene analysis package prefix to the class name.

        If className does not contain a period, the analysis SPI *Factory.lookupClass() methods are used to find the class.

        Parameters:
        className - The name or the short name of the class.
        expectedType - The superclass className is expected to extend
        Returns:
        the loaded class.
        Throws:
        ClassNotFoundException - if lookup fails
      • supportsParams

        public boolean supportsParams()
        Description copied from class: PerfTask
        Sub classes that support parameters must override this method to return true.
        Overrides:
        supportsParams in class PerfTask
        Returns:
        true iff this task supports command line params.
      • lineno

        public int lineno​(StreamTokenizer stok)
        Returns the current line in the algorithm file