public class KNearestNeighborClassifier extends Object implements Classifier<BytesRef>
http://en.wikipedia.org/wiki/K-nearest_neighbors
) based
on MoreLikeThis
Modifier and Type | Field and Description |
---|---|
protected String |
classFieldName
the name of the field used as the output text
|
protected IndexSearcher |
indexSearcher
an
IndexSearcher used to perform queries |
protected int |
k
the no.
|
protected MoreLikeThis |
mlt
a
MoreLikeThis instance used to perform MLT queries |
protected Query |
query
a
Query used to filter the documents that should be used from this classifier's underlying LeafReader |
protected String[] |
textFieldNames
the name of the fields used as the input text
|
Constructor and Description |
---|
KNearestNeighborClassifier(IndexReader indexReader,
Similarity similarity,
Analyzer analyzer,
Query query,
int k,
int minDocsFreq,
int minTermFreq,
String classFieldName,
String... textFieldNames)
Creates a
KNearestNeighborClassifier . |
Modifier and Type | Method and Description |
---|---|
ClassificationResult<BytesRef> |
assignClass(String text)
Assign a class (with score) to the given text String
|
protected List<ClassificationResult<BytesRef>> |
buildListFromTopDocs(TopDocs topDocs)
build a list of classification results from search results
|
protected ClassificationResult<BytesRef> |
classifyFromTopDocs(TopDocs knnResults)
TODO
|
List<ClassificationResult<BytesRef>> |
getClasses(String text)
Get all the classes (sorted by score, descending) assigned to the given text String.
|
List<ClassificationResult<BytesRef>> |
getClasses(String text,
int max)
Get the first
max classes (sorted by score, descending) assigned to the given text String. |
String |
toString() |
protected final MoreLikeThis mlt
MoreLikeThis
instance used to perform MLT queriesprotected final String[] textFieldNames
protected final String classFieldName
protected final IndexSearcher indexSearcher
IndexSearcher
used to perform queriesprotected final int k
protected final Query query
Query
used to filter the documents that should be used from this classifier's underlying LeafReader
public KNearestNeighborClassifier(IndexReader indexReader, Similarity similarity, Analyzer analyzer, Query query, int k, int minDocsFreq, int minTermFreq, String classFieldName, String... textFieldNames)
KNearestNeighborClassifier
.indexReader
- the reader on the index to be used for classificationanalyzer
- an Analyzer
used to analyze unseen textsimilarity
- the Similarity
to be used by the underlying IndexSearcher
or null
(defaults to BM25Similarity
)query
- a Query
to eventually filter the docs used for training the classifier, or null
if all the indexed docs should be usedk
- the no. of docs to select in the MLT results to find the nearest neighborminDocsFreq
- MoreLikeThis.minDocFreq
parameterminTermFreq
- MoreLikeThis.minTermFreq
parameterclassFieldName
- the name of the field used as the output for the classifiertextFieldNames
- the name of the fields used as the inputs for the classifier, they can contain boosting indication e.g. title^10public ClassificationResult<BytesRef> assignClass(String text) throws IOException
assignClass
in interface Classifier<BytesRef>
text
- a String containing text to be classifiedClassificationResult
holding assigned class of type T
and scoreIOException
- If there is a low-level I/O error.protected ClassificationResult<BytesRef> classifyFromTopDocs(TopDocs knnResults) throws IOException
IOException
public List<ClassificationResult<BytesRef>> getClasses(String text) throws IOException
getClasses
in interface Classifier<BytesRef>
text
- a String containing text to be classifiedClassificationResult
, the classes and scores. Returns null
if the classifier can't make lists.IOException
- If there is a low-level I/O error.public List<ClassificationResult<BytesRef>> getClasses(String text, int max) throws IOException
max
classes (sorted by score, descending) assigned to the given text String.getClasses
in interface Classifier<BytesRef>
text
- a String containing text to be classifiedmax
- the number of return list elementsClassificationResult
, the classes and scores. Cut for "max" number of elements. Returns null
if the classifier can't make lists.IOException
- If there is a low-level I/O error.protected List<ClassificationResult<BytesRef>> buildListFromTopDocs(TopDocs topDocs) throws IOException
topDocs
- the search results as a TopDocs
objectList
of ClassificationResult
, one for each existing classIOException
- if it's not possible to get the stored value of class fieldCopyright © 2000-2017 Apache Software Foundation. All Rights Reserved.