SimpleNaiveBayesClassifier (Lucene 5.5.0 API)

java.lang.Object
- org.apache.lucene.classification.SimpleNaiveBayesClassifier

All Implemented Interfaces:

Classifier<BytesRef>

Direct Known Subclasses:

CachingNaiveBayesClassifier
```
public class SimpleNaiveBayesClassifier
extends Object
implements Classifier<BytesRef>
```
A simplistic Lucene based NaiveBayes classifier, see http://en.wikipedia.org/wiki/Naive_Bayes_classifier

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary

Fields
Modifier and Type	Field and Description
`protected Analyzer`	`analyzer` `Analyzer` to be used for tokenizing unseen input text
`protected String`	`classFieldName` name of the field to be used as a class / category output
`protected IndexSearcher`	`indexSearcher` `IndexSearcher` to run searches on the index for retrieving frequencies
`protected LeafReader`	`leafReader` `LeafReader` used to access the `Classifier`'s index
`protected Query`	`query` `Query` used to eventually filter the document set to be used to classify
`protected String[]`	`textFieldNames` names of the fields to be used as input text

Constructor Summary

Constructors
Constructor and Description

SimpleNaiveBayesClassifier()
Creates a new NaiveBayes classifier.

Constructors
Constructor and Description
`SimpleNaiveBayesClassifier()` Creates a new NaiveBayes classifier.

Method Summary

Methods
Modifier and Type	Method and Description
`ClassificationResult<BytesRef>`	`assignClass(String inputDocument)` Assign a class (with score) to the given text String
`protected int`	`countDocsWithClass()` count the number of documents in the index having at least a value for the 'class' field
`List<ClassificationResult<BytesRef>>`	`getClasses(String text)` Get all the classes (sorted by score, descending) assigned to the given text String.
`List<ClassificationResult<BytesRef>>`	`getClasses(String text, int max)` Get the first `max` classes (sorted by score, descending) assigned to the given text String.
`protected String[]`	`tokenizeDoc(String doc)` tokenize a `String` on this classifier's text fields and analyzer
`void`	`train(LeafReader leafReader, String[] textFieldNames, String classFieldName, Analyzer analyzer, Query query)` Train the classifier using the underlying Lucene index
`void`	`train(LeafReader leafReader, String textFieldName, String classFieldName, Analyzer analyzer)` Train the classifier using the underlying Lucene index
`void`	`train(LeafReader leafReader, String textFieldName, String classFieldName, Analyzer analyzer, Query query)` Train the classifier using the underlying Lucene index

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - leafReader
```
protected LeafReader leafReader
```
    LeafReader used to access the Classifier's index
  - textFieldNames
```
protected String[] textFieldNames
```
    names of the fields to be used as input text
  - classFieldName
```
protected String classFieldName
```
    name of the field to be used as a class / category output
  - analyzer
```
protected Analyzer analyzer
```
    Analyzer to be used for tokenizing unseen input text
  - indexSearcher
```
protected IndexSearcher indexSearcher
```
    IndexSearcher to run searches on the index for retrieving frequencies
  - query
```
protected Query query
```
    Query used to eventually filter the document set to be used to classify
- Constructor Detail
  - SimpleNaiveBayesClassifier
```
public SimpleNaiveBayesClassifier()
```
    Creates a new NaiveBayes classifier. Note that you must call train() before you can classify any documents.
- Method Detail
  - train
```
public void train(LeafReader leafReader,
         String textFieldName,
         String classFieldName,
         Analyzer analyzer)
           throws IOException
```
    Train the classifier using the underlying Lucene index
    
    Specified by:
    
    train in interface Classifier<BytesRef>
    
    Parameters:
    leafReader - the reader to use to access the Lucene index
    textFieldName - the name of the field used to compare documents
    classFieldName - the name of the field containing the class assigned to documents
    analyzer - the analyzer used to tokenize / filter the unseen text
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - train
```
public void train(LeafReader leafReader,
         String textFieldName,
         String classFieldName,
         Analyzer analyzer,
         Query query)
           throws IOException
```
    Train the classifier using the underlying Lucene index
    
    Specified by:
    
    train in interface Classifier<BytesRef>
    
    Parameters:
    leafReader - the reader to use to access the Lucene index
    textFieldName - the name of the field used to compare documents
    classFieldName - the name of the field containing the class assigned to documents
    analyzer - the analyzer used to tokenize / filter the unseen text
    query - the query to filter which documents use for training
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - train
```
public void train(LeafReader leafReader,
         String[] textFieldNames,
         String classFieldName,
         Analyzer analyzer,
         Query query)
           throws IOException
```
    Train the classifier using the underlying Lucene index
    
    Specified by:
    
    train in interface Classifier<BytesRef>
    
    Parameters:
    leafReader - the reader to use to access the Lucene index
    textFieldNames - the names of the fields to be used to compare documents
    classFieldName - the name of the field containing the class assigned to documents
    analyzer - the analyzer used to tokenize / filter the unseen text
    query - the query to filter which documents use for training
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - assignClass
```
public ClassificationResult<BytesRef> assignClass(String inputDocument)
                                           throws IOException
```
    Assign a class (with score) to the given text String
    
    Specified by:
    
    assignClass in interface Classifier<BytesRef>
    
    Parameters:
    inputDocument - a String containing text to be classified
    
    Returns:
    a ClassificationResult holding assigned class of type T and score
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - getClasses
```
public List<ClassificationResult<BytesRef>> getClasses(String text)
                                                throws IOException
```
    Get all the classes (sorted by score, descending) assigned to the given text String.
    
    Specified by:
    
    getClasses in interface Classifier<BytesRef>
    
    Parameters:
    text - a String containing text to be classified
    
    Returns:
    the whole list of ClassificationResult, the classes and scores. Returns null if the classifier can't make lists.
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - getClasses
```
public List<ClassificationResult<BytesRef>> getClasses(String text,
                                              int max)
                                                throws IOException
```
    Get the first max classes (sorted by score, descending) assigned to the given text String.
    
    Specified by:
    
    getClasses in interface Classifier<BytesRef>
    
    Parameters:
    text - a String containing text to be classified
    max - the number of return list elements
    
    Returns:
    the whole list of ClassificationResult, the classes and scores. Cut for "max" number of elements. Returns null if the classifier can't make lists.
    
    Throws:
    
    IOException - If there is a low-level I/O error.
  - countDocsWithClass
```
protected int countDocsWithClass()
                          throws IOException
```
    count the number of documents in the index having at least a value for the 'class' field
    
    Returns:
    the no. of documents having a value for the 'class' field
    
    Throws:
    
    IOException - if accessing to term vectors or search fails
  - tokenizeDoc
```
protected String[] tokenizeDoc(String doc)
                        throws IOException
```
    tokenize a String on this classifier's text fields and analyzer
    
    Parameters:
    doc - the String representing an input text (to be classified)
    
    Returns:
    a String array of the resulting tokens
    
    Throws:
    
    IOException - if tokenization fails

Class SimpleNaiveBayesClassifier

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

leafReader

textFieldNames

classFieldName

analyzer

indexSearcher

query

Constructor Detail

SimpleNaiveBayesClassifier

Method Detail

train

train

train

assignClass

getClasses

getClasses

countDocsWithClass

tokenizeDoc