public class SimpleNaiveBayesClassifier extends Object implements Classifier<BytesRef>
http://en.wikipedia.org/wiki/Naive_Bayes_classifier
Modifier and Type | Field and Description |
---|---|
protected Analyzer |
analyzer
Analyzer to be used for tokenizing unseen input text |
protected String |
classFieldName
name of the field to be used as a class / category output
|
protected IndexSearcher |
indexSearcher
IndexSearcher to run searches on the index for retrieving frequencies |
protected LeafReader |
leafReader
LeafReader used to access the Classifier 's
index |
protected Query |
query
Query used to eventually filter the document set to be used to classify |
protected String[] |
textFieldNames
names of the fields to be used as input text
|
Constructor and Description |
---|
SimpleNaiveBayesClassifier()
Creates a new NaiveBayes classifier.
|
Modifier and Type | Method and Description |
---|---|
ClassificationResult<BytesRef> |
assignClass(String inputDocument)
Assign a class (with score) to the given text String
|
protected int |
countDocsWithClass()
count the number of documents in the index having at least a value for the 'class' field
|
List<ClassificationResult<BytesRef>> |
getClasses(String text)
Get all the classes (sorted by score, descending) assigned to the given text String.
|
List<ClassificationResult<BytesRef>> |
getClasses(String text,
int max)
Get the first
max classes (sorted by score, descending) assigned to the given text String. |
protected String[] |
tokenizeDoc(String doc)
tokenize a
String on this classifier's text fields and analyzer |
void |
train(LeafReader leafReader,
String[] textFieldNames,
String classFieldName,
Analyzer analyzer,
Query query)
Train the classifier using the underlying Lucene index
|
void |
train(LeafReader leafReader,
String textFieldName,
String classFieldName,
Analyzer analyzer)
Train the classifier using the underlying Lucene index
|
void |
train(LeafReader leafReader,
String textFieldName,
String classFieldName,
Analyzer analyzer,
Query query)
Train the classifier using the underlying Lucene index
|
protected LeafReader leafReader
LeafReader
used to access the Classifier
's
indexprotected String[] textFieldNames
protected String classFieldName
protected IndexSearcher indexSearcher
IndexSearcher
to run searches on the index for retrieving frequenciespublic SimpleNaiveBayesClassifier()
train()
before you can
classify any documents.public void train(LeafReader leafReader, String textFieldName, String classFieldName, Analyzer analyzer) throws IOException
train
in interface Classifier<BytesRef>
leafReader
- the reader to use to access the Lucene indextextFieldName
- the name of the field used to compare documentsclassFieldName
- the name of the field containing the class assigned to documentsanalyzer
- the analyzer used to tokenize / filter the unseen textIOException
- If there is a low-level I/O error.public void train(LeafReader leafReader, String textFieldName, String classFieldName, Analyzer analyzer, Query query) throws IOException
train
in interface Classifier<BytesRef>
leafReader
- the reader to use to access the Lucene indextextFieldName
- the name of the field used to compare documentsclassFieldName
- the name of the field containing the class assigned to documentsanalyzer
- the analyzer used to tokenize / filter the unseen textquery
- the query to filter which documents use for trainingIOException
- If there is a low-level I/O error.public void train(LeafReader leafReader, String[] textFieldNames, String classFieldName, Analyzer analyzer, Query query) throws IOException
train
in interface Classifier<BytesRef>
leafReader
- the reader to use to access the Lucene indextextFieldNames
- the names of the fields to be used to compare documentsclassFieldName
- the name of the field containing the class assigned to documentsanalyzer
- the analyzer used to tokenize / filter the unseen textquery
- the query to filter which documents use for trainingIOException
- If there is a low-level I/O error.public ClassificationResult<BytesRef> assignClass(String inputDocument) throws IOException
assignClass
in interface Classifier<BytesRef>
inputDocument
- a String containing text to be classifiedClassificationResult
holding assigned class of type T
and scoreIOException
- If there is a low-level I/O error.public List<ClassificationResult<BytesRef>> getClasses(String text) throws IOException
getClasses
in interface Classifier<BytesRef>
text
- a String containing text to be classifiedClassificationResult
, the classes and scores. Returns null
if the classifier can't make lists.IOException
- If there is a low-level I/O error.public List<ClassificationResult<BytesRef>> getClasses(String text, int max) throws IOException
max
classes (sorted by score, descending) assigned to the given text String.getClasses
in interface Classifier<BytesRef>
text
- a String containing text to be classifiedmax
- the number of return list elementsClassificationResult
, the classes and scores. Cut for "max" number of elements. Returns null
if the classifier can't make lists.IOException
- If there is a low-level I/O error.protected int countDocsWithClass() throws IOException
IOException
- if accessing to term vectors or search failsprotected String[] tokenizeDoc(String doc) throws IOException
String
on this classifier's text fields and analyzerdoc
- the String
representing an input text (to be classified)String
array of the resulting tokensIOException
- if tokenization failsCopyright © 2000-2016 Apache Software Foundation. All Rights Reserved.