Package org.apache.lucene.classification
Class KNearestNeighborClassifier
java.lang.Object
org.apache.lucene.classification.KNearestNeighborClassifier
- All Implemented Interfaces:
Classifier<BytesRef>
- Direct Known Subclasses:
KNearestNeighborDocumentClassifier
A k-Nearest Neighbor classifier (see
http://en.wikipedia.org/wiki/K-nearest_neighbors
) based on MoreLikeThis
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final String
the name of the field used as the output textprotected final IndexSearcher
anIndexSearcher
used to perform queriesprotected final int
the no.protected final MoreLikeThis
aMoreLikeThis
instance used to perform MLT queriesprotected final Query
aQuery
used to filter the documents that should be used from this classifier's underlyingLeafReader
protected final String[]
the name of the fields used as the input text -
Constructor Summary
ConstructorsConstructorDescriptionKNearestNeighborClassifier
(IndexReader indexReader, Similarity similarity, Analyzer analyzer, Query query, int k, int minDocsFreq, int minTermFreq, String classFieldName, String... textFieldNames) Creates aKNearestNeighborClassifier
. -
Method Summary
Modifier and TypeMethodDescriptionassignClass
(String text) Assign a class (with score) to the given text Stringprotected List
<ClassificationResult<BytesRef>> buildListFromTopDocs
(TopDocs topDocs) build a list of classification results from search resultsprotected ClassificationResult
<BytesRef> classifyFromTopDocs
(TopDocs knnResults) TODOgetClasses
(String text) Get all the classes (sorted by score, descending) assigned to the given text String.getClasses
(String text, int max) Get the firstmax
classes (sorted by score, descending) assigned to the given text String.toString()
-
Field Details
-
mlt
aMoreLikeThis
instance used to perform MLT queries -
textFieldNames
the name of the fields used as the input text -
classFieldName
the name of the field used as the output text -
indexSearcher
anIndexSearcher
used to perform queries -
k
protected final int kthe no. of docs to compare in order to find the nearest neighbor to the input text -
query
aQuery
used to filter the documents that should be used from this classifier's underlyingLeafReader
-
-
Constructor Details
-
KNearestNeighborClassifier
public KNearestNeighborClassifier(IndexReader indexReader, Similarity similarity, Analyzer analyzer, Query query, int k, int minDocsFreq, int minTermFreq, String classFieldName, String... textFieldNames) throws IOException Creates aKNearestNeighborClassifier
.- Parameters:
indexReader
- the reader on the index to be used for classificationsimilarity
- theSimilarity
to be used by the underlyingIndexSearcher
ornull
(defaults toBM25Similarity
)analyzer
- anAnalyzer
used to analyze unseen textquery
- aQuery
to eventually filter the docs used for training the classifier, ornull
if all the indexed docs should be usedk
- the no. of docs to select in the MLT results to find the nearest neighborminDocsFreq
-MoreLikeThis.minDocFreq
parameterminTermFreq
-MoreLikeThis.minTermFreq
parameterclassFieldName
- the name of the field used as the output for the classifiertextFieldNames
- the name of the fields used as the inputs for the classifier, they can contain boosting indication e.g. title^10- Throws:
IOException
-
-
Method Details
-
assignClass
Description copied from interface:Classifier
Assign a class (with score) to the given text String- Specified by:
assignClass
in interfaceClassifier<BytesRef>
- Parameters:
text
- a String containing text to be classified- Returns:
- a
ClassificationResult
holding assigned class of typeT
and score - Throws:
IOException
- If there is a low-level I/O error.
-
classifyFromTopDocs
TODO- Throws:
IOException
-
getClasses
Description copied from interface:Classifier
Get all the classes (sorted by score, descending) assigned to the given text String.- Specified by:
getClasses
in interfaceClassifier<BytesRef>
- Parameters:
text
- a String containing text to be classified- Returns:
- the whole list of
ClassificationResult
, the classes and scores. Returnsnull
if the classifier can't make lists. - Throws:
IOException
- If there is a low-level I/O error.
-
getClasses
Description copied from interface:Classifier
Get the firstmax
classes (sorted by score, descending) assigned to the given text String.- Specified by:
getClasses
in interfaceClassifier<BytesRef>
- Parameters:
text
- a String containing text to be classifiedmax
- the number of return list elements- Returns:
- the whole list of
ClassificationResult
, the classes and scores. Cut for "max" number of elements. Returnsnull
if the classifier can't make lists. - Throws:
IOException
- If there is a low-level I/O error.
-
buildListFromTopDocs
protected List<ClassificationResult<BytesRef>> buildListFromTopDocs(TopDocs topDocs) throws IOException build a list of classification results from search results- Parameters:
topDocs
- the search results as aTopDocs
object- Returns:
- a
List
ofClassificationResult
, one for each existing class - Throws:
IOException
- if it's not possible to get the stored value of class field
-
toString
-