public class BooleanPerceptronClassifier extends Object implements Classifier<Boolean>
http://en.wikipedia.org/wiki/Perceptron
) based
Boolean
Classifier
. The
weights are calculated using
TermsEnum.totalTermFreq()
both on a per field
and a per document basis and then a corresponding
FST
is used for class assignment.Constructor and Description |
---|
BooleanPerceptronClassifier()
Default constructor, no batch updates of FST, perceptron threshold is
calculated via underlying index metrics during
training |
BooleanPerceptronClassifier(Double threshold,
Integer batchSize)
Create a
BooleanPerceptronClassifier |
Modifier and Type | Method and Description |
---|---|
ClassificationResult<Boolean> |
assignClass(String text)
Assign a class (with score) to the given text String
|
List<ClassificationResult<Boolean>> |
getClasses(String text)
Get all the classes (sorted by score, descending) assigned to the given text String.
|
List<ClassificationResult<Boolean>> |
getClasses(String text,
int max)
Get the first
max classes (sorted by score, descending) assigned to the given text String. |
void |
train(LeafReader leafReader,
String[] textFieldNames,
String classFieldName,
Analyzer analyzer,
Query query)
Train the classifier using the underlying Lucene index
|
void |
train(LeafReader leafReader,
String textFieldName,
String classFieldName,
Analyzer analyzer)
Train the classifier using the underlying Lucene index
|
void |
train(LeafReader leafReader,
String textFieldName,
String classFieldName,
Analyzer analyzer,
Query query)
Train the classifier using the underlying Lucene index
|
public BooleanPerceptronClassifier(Double threshold, Integer batchSize)
BooleanPerceptronClassifier
threshold
- the binary threshold for perceptron output evaluationpublic BooleanPerceptronClassifier()
training
public ClassificationResult<Boolean> assignClass(String text) throws IOException
assignClass
in interface Classifier<Boolean>
text
- a String containing text to be classifiedClassificationResult
holding assigned class of type T
and scoreIOException
- If there is a low-level I/O error.public void train(LeafReader leafReader, String textFieldName, String classFieldName, Analyzer analyzer) throws IOException
train
in interface Classifier<Boolean>
leafReader
- the reader to use to access the Lucene indextextFieldName
- the name of the field used to compare documentsclassFieldName
- the name of the field containing the class assigned to documentsanalyzer
- the analyzer used to tokenize / filter the unseen textIOException
- If there is a low-level I/O error.public void train(LeafReader leafReader, String textFieldName, String classFieldName, Analyzer analyzer, Query query) throws IOException
train
in interface Classifier<Boolean>
leafReader
- the reader to use to access the Lucene indextextFieldName
- the name of the field used to compare documentsclassFieldName
- the name of the field containing the class assigned to documentsanalyzer
- the analyzer used to tokenize / filter the unseen textquery
- the query to filter which documents use for trainingIOException
- If there is a low-level I/O error.public void train(LeafReader leafReader, String[] textFieldNames, String classFieldName, Analyzer analyzer, Query query) throws IOException
Classifier
train
in interface Classifier<Boolean>
leafReader
- the reader to use to access the Lucene indextextFieldNames
- the names of the fields to be used to compare documentsclassFieldName
- the name of the field containing the class assigned to documentsanalyzer
- the analyzer used to tokenize / filter the unseen textquery
- the query to filter which documents use for trainingIOException
- If there is a low-level I/O error.public List<ClassificationResult<Boolean>> getClasses(String text) throws IOException
getClasses
in interface Classifier<Boolean>
text
- a String containing text to be classifiedClassificationResult
, the classes and scores. Returns null
if the classifier can't make lists.IOException
- If there is a low-level I/O error.public List<ClassificationResult<Boolean>> getClasses(String text, int max) throws IOException
max
classes (sorted by score, descending) assigned to the given text String.getClasses
in interface Classifier<Boolean>
text
- a String containing text to be classifiedmax
- the number of return list elementsClassificationResult
, the classes and scores. Cut for "max" number of elements. Returns null
if the classifier can't make lists.IOException
- If there is a low-level I/O error.Copyright © 2000-2016 Apache Software Foundation. All Rights Reserved.