java.lang.Object

org.apache.lucene.classification.KNearestNeighborClassifier

org.apache.lucene.classification.document.KNearestNeighborDocumentClassifier

All Implemented Interfaces:: Classifier<BytesRef>, DocumentClassifier<BytesRef>

public class KNearestNeighborDocumentClassifier extends KNearestNeighborClassifier implements DocumentClassifier<BytesRef>

A k-Nearest Neighbor Document classifier (see


 http://en.wikipedia.org/wiki/K-nearest_neighbors

) based on MoreLikeThis .

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary

Fields

Modifier and Type

Field

Description

protected final Map<String,Analyzer>

field2analyzer

map of per field analyzers

Fields inherited from class org.apache.lucene.classification.KNearestNeighborClassifier
classFieldName, indexSearcher, k, mlt, query, textFieldNames
Constructor Summary

Constructors

Constructor

Description

KNearestNeighborDocumentClassifier(IndexReader indexReader, Similarity similarity, Query query, int k, int minDocsFreq, int minTermFreq, String classFieldName, Map<String,Analyzer> field2analyzer, String... textFieldNames)

Creates a KNearestNeighborClassifier.
Method Summary

Modifier and Type

Method

Description

ClassificationResult<BytesRef>

assignClass(Document document)

Assign a class (with score) to the given Document

List<ClassificationResult<BytesRef>>

getClasses(Document document)

Get all the classes (sorted by score, descending) assigned to the given Document.

List<ClassificationResult<BytesRef>>

getClasses(Document document, int max)

Get the first max classes (sorted by score, descending) assigned to the given text String.

Methods inherited from class org.apache.lucene.classification.KNearestNeighborClassifier
assignClass, buildListFromTopDocs, classifyFromTopDocs, getClasses, getClasses, toString

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- field2analyzer
  
  protected final Map<String,Analyzer> field2analyzer
  
  map of per field analyzers
Constructor Details
- KNearestNeighborDocumentClassifier
  
  public KNearestNeighborDocumentClassifier(IndexReader indexReader, Similarity similarity, Query query, int k, int minDocsFreq, int minTermFreq, String classFieldName, Map<String,Analyzer> field2analyzer, String... textFieldNames)
  
  Creates a KNearestNeighborClassifier.
  
  Parameters:
  
  indexReader - the reader on the index to be used for classification
  
  similarity - the Similarity to be used by the underlying IndexSearcher or null (defaults to BM25Similarity)
  
  query - a Query to eventually filter the docs used for training the classifier, or null if all the indexed docs should be used
  
  k - the no. of docs to select in the MLT results to find the nearest neighbor
  
  minDocsFreq - MoreLikeThis.minDocFreq parameter
  
  minTermFreq - MoreLikeThis.minTermFreq parameter
  
  classFieldName - the name of the field used as the output for the classifier
  
  field2analyzer - map with key a field name and the related {org.apache.lucene.analysis.Analyzer}
  
  textFieldNames - the name of the fields used as the inputs for the classifier, they can contain boosting indication e.g. title^10
Method Details
- assignClass
  
  public ClassificationResult<BytesRef> assignClass(Document document) throws IOException
  
  Description copied from interface: DocumentClassifier
  
  Assign a class (with score) to the given Document
  
  Specified by:
  
  assignClass in interface DocumentClassifier<BytesRef>
  
  Parameters:
  
  document - a Document to be classified. Fields are considered features for the classification.
  
  Returns:
  
  a ClassificationResult holding assigned class of type T and score
  
  Throws:
  
  IOException - If there is a low-level I/O error.
- getClasses
  
  public List<ClassificationResult<BytesRef>> getClasses(Document document) throws IOException
  
  Description copied from interface: DocumentClassifier
  
  Get all the classes (sorted by score, descending) assigned to the given Document.
  
  Specified by:
  
  getClasses in interface DocumentClassifier<BytesRef>
  
  Parameters:
  
  document - a Document to be classified. Fields are considered features for the classification.
  
  Returns:
  
  the whole list of ClassificationResult, the classes and scores. Returns null if the classifier can't make lists.
  
  Throws:
  
  IOException - If there is a low-level I/O error.
- getClasses
  
  public List<ClassificationResult<BytesRef>> getClasses(Document document, int max) throws IOException
  
  Description copied from interface: DocumentClassifier
  
  Get the first max classes (sorted by score, descending) assigned to the given text String.
  
  Specified by:
  
  getClasses in interface DocumentClassifier<BytesRef>
  
  Parameters:
  
  document - a Document to be classified. Fields are considered features for the classification.
  
  max - the number of return list elements
  
  Returns:
  
  the whole list of ClassificationResult, the classes and scores. Cut for "max" number of elements. Returns null if the classifier can't make lists.
  
  Throws:
  
  IOException - If there is a low-level I/O error.

Class KNearestNeighborDocumentClassifier

Field Summary

Fields inherited from class org.apache.lucene.classification.KNearestNeighborClassifier

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.classification.KNearestNeighborClassifier

Methods inherited from class java.lang.Object

Field Details

field2analyzer

Constructor Details

KNearestNeighborDocumentClassifier

Method Details

assignClass

getClasses

getClasses