Package org.apache.lucene.classification
Class BooleanPerceptronClassifier
java.lang.Object
org.apache.lucene.classification.BooleanPerceptronClassifier
- All Implemented Interfaces:
Classifier<Boolean>
A perceptron (see
http://en.wikipedia.org/wiki/Perceptron
) based Boolean
Classifier
. The weights are calculated using
TermsEnum.totalTermFreq()
both on a per field and a per document
basis and then a corresponding FST
is used for class
assignment.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Constructor Summary
ConstructorDescriptionBooleanPerceptronClassifier
(IndexReader indexReader, Analyzer analyzer, Query query, Integer batchSize, Double bias, String classFieldName, String textFieldName) Creates aBooleanPerceptronClassifier
-
Method Summary
Modifier and TypeMethodDescriptionassignClass
(String text) Assign a class (with score) to the given text StringgetClasses
(String text) Get all the classes (sorted by score, descending) assigned to the given text String.getClasses
(String text, int max) Get the firstmax
classes (sorted by score, descending) assigned to the given text String.
-
Constructor Details
-
BooleanPerceptronClassifier
public BooleanPerceptronClassifier(IndexReader indexReader, Analyzer analyzer, Query query, Integer batchSize, Double bias, String classFieldName, String textFieldName) throws IOException Creates aBooleanPerceptronClassifier
- Parameters:
indexReader
- the reader on the index to be used for classificationanalyzer
- anAnalyzer
used to analyze unseen textquery
- aQuery
to eventually filter the docs used for training the classifier, ornull
if all the indexed docs should be usedbatchSize
- the size of the batch of docs to use for updating the perceptron weightsbias
- the bias used for class separationclassFieldName
- the name of the field used as the output for the classifiertextFieldName
- the name of the field used as input for the classifier- Throws:
IOException
- if the building of the underlyingFST
fails and / orTermsEnum
for the text field cannot be found
-
-
Method Details
-
assignClass
Description copied from interface:Classifier
Assign a class (with score) to the given text String- Specified by:
assignClass
in interfaceClassifier<Boolean>
- Parameters:
text
- a String containing text to be classified- Returns:
- a
ClassificationResult
holding assigned class of typeT
and score - Throws:
IOException
- If there is a low-level I/O error.
-
getClasses
Description copied from interface:Classifier
Get all the classes (sorted by score, descending) assigned to the given text String.- Specified by:
getClasses
in interfaceClassifier<Boolean>
- Parameters:
text
- a String containing text to be classified- Returns:
- the whole list of
ClassificationResult
, the classes and scores. Returnsnull
if the classifier can't make lists.
-
getClasses
Description copied from interface:Classifier
Get the firstmax
classes (sorted by score, descending) assigned to the given text String.- Specified by:
getClasses
in interfaceClassifier<Boolean>
- Parameters:
text
- a String containing text to be classifiedmax
- the number of return list elements- Returns:
- the whole list of
ClassificationResult
, the classes and scores. Cut for "max" number of elements. Returnsnull
if the classifier can't make lists.
-