Class BooleanPerceptronClassifier

  • All Implemented Interfaces:
    Classifier<Boolean>

    public class BooleanPerceptronClassifier
    extends Object
    implements Classifier<Boolean>
    A perceptron (see http://en.wikipedia.org/wiki/Perceptron) based Boolean Classifier. The weights are calculated using TermsEnum.totalTermFreq() both on a per field and a per document basis and then a corresponding FST is used for class assignment.
    WARNING: This API is experimental and might change in incompatible ways in the next release.
    • Constructor Detail

      • BooleanPerceptronClassifier

        public BooleanPerceptronClassifier​(IndexReader indexReader,
                                           Analyzer analyzer,
                                           Query query,
                                           Integer batchSize,
                                           Double bias,
                                           String classFieldName,
                                           String textFieldName)
                                    throws IOException
        Parameters:
        indexReader - the reader on the index to be used for classification
        analyzer - an Analyzer used to analyze unseen text
        query - a Query to eventually filter the docs used for training the classifier, or null if all the indexed docs should be used
        batchSize - the size of the batch of docs to use for updating the perceptron weights
        bias - the bias used for class separation
        classFieldName - the name of the field used as the output for the classifier
        textFieldName - the name of the field used as input for the classifier
        Throws:
        IOException - if the building of the underlying FST fails and / or TermsEnum for the text field cannot be found