Class KNearestNeighborDocumentClassifier
java.lang.Object
org.apache.lucene.classification.KNearestNeighborClassifier
org.apache.lucene.classification.document.KNearestNeighborDocumentClassifier
- All Implemented Interfaces:
Classifier<BytesRef>
,DocumentClassifier<BytesRef>
public class KNearestNeighborDocumentClassifier
extends KNearestNeighborClassifier
implements DocumentClassifier<BytesRef>
A k-Nearest Neighbor Document classifier (see
http://en.wikipedia.org/wiki/K-nearest_neighbors
) based on MoreLikeThis
.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Field Summary
Modifier and TypeFieldDescriptionmap of per field analyzersFields inherited from class org.apache.lucene.classification.KNearestNeighborClassifier
classFieldName, indexSearcher, k, mlt, query, textFieldNames
-
Constructor Summary
ConstructorDescriptionKNearestNeighborDocumentClassifier
(IndexReader indexReader, Similarity similarity, Query query, int k, int minDocsFreq, int minTermFreq, String classFieldName, Map<String, Analyzer> field2analyzer, String... textFieldNames) Creates aKNearestNeighborClassifier
. -
Method Summary
Modifier and TypeMethodDescriptionassignClass
(Document document) Assign a class (with score) to the givenDocument
getClasses
(Document document) Get all the classes (sorted by score, descending) assigned to the givenDocument
.getClasses
(Document document, int max) Get the firstmax
classes (sorted by score, descending) assigned to the given text String.Methods inherited from class org.apache.lucene.classification.KNearestNeighborClassifier
assignClass, buildListFromTopDocs, classifyFromTopDocs, getClasses, getClasses, toString
-
Field Details
-
field2analyzer
map of per field analyzers
-
-
Constructor Details
-
KNearestNeighborDocumentClassifier
public KNearestNeighborDocumentClassifier(IndexReader indexReader, Similarity similarity, Query query, int k, int minDocsFreq, int minTermFreq, String classFieldName, Map<String, Analyzer> field2analyzer, String... textFieldNames) Creates aKNearestNeighborClassifier
.- Parameters:
indexReader
- the reader on the index to be used for classificationsimilarity
- theSimilarity
to be used by the underlyingIndexSearcher
ornull
(defaults toBM25Similarity
)query
- aQuery
to eventually filter the docs used for training the classifier, ornull
if all the indexed docs should be usedk
- the no. of docs to select in the MLT results to find the nearest neighborminDocsFreq
-MoreLikeThis.minDocFreq
parameterminTermFreq
-MoreLikeThis.minTermFreq
parameterclassFieldName
- the name of the field used as the output for the classifierfield2analyzer
- map with key a field name and the related {org.apache.lucene.analysis.Analyzer}textFieldNames
- the name of the fields used as the inputs for the classifier, they can contain boosting indication e.g. title^10
-
-
Method Details
-
assignClass
Description copied from interface:DocumentClassifier
Assign a class (with score) to the givenDocument
- Specified by:
assignClass
in interfaceDocumentClassifier<BytesRef>
- Parameters:
document
- aDocument
to be classified. Fields are considered features for the classification.- Returns:
- a
ClassificationResult
holding assigned class of typeT
and score - Throws:
IOException
- If there is a low-level I/O error.
-
getClasses
Description copied from interface:DocumentClassifier
Get all the classes (sorted by score, descending) assigned to the givenDocument
.- Specified by:
getClasses
in interfaceDocumentClassifier<BytesRef>
- Parameters:
document
- aDocument
to be classified. Fields are considered features for the classification.- Returns:
- the whole list of
ClassificationResult
, the classes and scores. Returnsnull
if the classifier can't make lists. - Throws:
IOException
- If there is a low-level I/O error.
-
getClasses
public List<ClassificationResult<BytesRef>> getClasses(Document document, int max) throws IOException Description copied from interface:DocumentClassifier
Get the firstmax
classes (sorted by score, descending) assigned to the given text String.- Specified by:
getClasses
in interfaceDocumentClassifier<BytesRef>
- Parameters:
document
- aDocument
to be classified. Fields are considered features for the classification.max
- the number of return list elements- Returns:
- the whole list of
ClassificationResult
, the classes and scores. Cut for "max" number of elements. Returnsnull
if the classifier can't make lists. - Throws:
IOException
- If there is a low-level I/O error.
-