Interface DocumentClassifier<T>

All Known Implementing Classes:
KNearestNeighborDocumentClassifier, SimpleNaiveBayesDocumentClassifier

public interface DocumentClassifier<T>
A classifier, see http://en.wikipedia.org/wiki/Classifier_(mathematics), which assign classes of type T to a Documents
WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Method Details

    • assignClass

      ClassificationResult<T> assignClass(Document document) throws IOException
      Assign a class (with score) to the given Document
      Parameters:
      document - a Document to be classified. Fields are considered features for the classification.
      Returns:
      a ClassificationResult holding assigned class of type T and score
      Throws:
      IOException - If there is a low-level I/O error.
    • getClasses

      List<ClassificationResult<T>> getClasses(Document document) throws IOException
      Get all the classes (sorted by score, descending) assigned to the given Document.
      Parameters:
      document - a Document to be classified. Fields are considered features for the classification.
      Returns:
      the whole list of ClassificationResult, the classes and scores. Returns null if the classifier can't make lists.
      Throws:
      IOException - If there is a low-level I/O error.
    • getClasses

      List<ClassificationResult<T>> getClasses(Document document, int max) throws IOException
      Get the first max classes (sorted by score, descending) assigned to the given text String.
      Parameters:
      document - a Document to be classified. Fields are considered features for the classification.
      max - the number of return list elements
      Returns:
      the whole list of ClassificationResult, the classes and scores. Cut for "max" number of elements. Returns null if the classifier can't make lists.
      Throws:
      IOException - If there is a low-level I/O error.