Class BooleanPerceptronClassifier

java.lang.Object
org.apache.lucene.classification.BooleanPerceptronClassifier
All Implemented Interfaces:
Classifier<Boolean>

public class BooleanPerceptronClassifier extends Object implements Classifier<Boolean>
A perceptron (see http://en.wikipedia.org/wiki/Perceptron) based Boolean Classifier. The weights are calculated using TermsEnum.totalTermFreq() both on a per field and a per document basis and then a corresponding FST is used for class assignment.
WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Constructor Details

    • BooleanPerceptronClassifier

      public BooleanPerceptronClassifier(IndexReader indexReader, Analyzer analyzer, Query query, Integer batchSize, Double bias, String classFieldName, String textFieldName) throws IOException
      Parameters:
      indexReader - the reader on the index to be used for classification
      analyzer - an Analyzer used to analyze unseen text
      query - a Query to eventually filter the docs used for training the classifier, or null if all the indexed docs should be used
      batchSize - the size of the batch of docs to use for updating the perceptron weights
      bias - the bias used for class separation
      classFieldName - the name of the field used as the output for the classifier
      textFieldName - the name of the field used as input for the classifier
      Throws:
      IOException - if the building of the underlying FST fails and / or TermsEnum for the text field cannot be found
  • Method Details