java.lang.Object

org.apache.lucene.classification.SimpleNaiveBayesClassifier

org.apache.lucene.classification.CachingNaiveBayesClassifier

All Implemented Interfaces:: Classifier<BytesRef>

public class CachingNaiveBayesClassifier extends SimpleNaiveBayesClassifier

A simplistic Lucene based NaiveBayes classifier, with caching feature, see


 http://en.wikipedia.org/wiki/Naive_Bayes_classifier

This is NOT an online classifier.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary

Fields inherited from class org.apache.lucene.classification.SimpleNaiveBayesClassifier
analyzer, classFieldName, indexReader, indexSearcher, query, textFieldNames
Constructor Summary

Constructors

Constructor

Description

CachingNaiveBayesClassifier(IndexReader indexReader, Analyzer analyzer, Query query, String classFieldName, String... textFieldNames)

Creates a new NaiveBayes classifier with inside caching.
Method Summary

Modifier and Type

Method

Description

protected List<ClassificationResult<BytesRef>>

assignClassNormalizedList(String inputDocument)

Transforms values into a range between 0 and 1

void

reInitCache(int minTermOccurrenceInCache, boolean justCachedTerms)

This function is building the frame of the cache.

Methods inherited from class org.apache.lucene.classification.SimpleNaiveBayesClassifier
assignClass, countDocsWithClass, getClasses, getClasses, normClassificationResults, tokenize

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- CachingNaiveBayesClassifier
  
  public CachingNaiveBayesClassifier(IndexReader indexReader, Analyzer analyzer, Query query, String classFieldName, String... textFieldNames)
  
  Creates a new NaiveBayes classifier with inside caching. If you want less memory usage you could call reInitCache().
  
  Parameters:
  
  indexReader - the reader on the index to be used for classification
  
  analyzer - an Analyzer used to analyze unseen text
  
  query - a Query to eventually filter the docs used for training the classifier, or null if all the indexed docs should be used
  
  classFieldName - the name of the field used as the output for the classifier
  
  textFieldNames - the name of the fields used as the inputs for the classifier
Method Details
- assignClassNormalizedList
  
  protected List<ClassificationResult<BytesRef>> assignClassNormalizedList(String inputDocument) throws IOException
  
  Transforms values into a range between 0 and 1
  
  Overrides:
  
  assignClassNormalizedList in class SimpleNaiveBayesClassifier
  
  Parameters:
  
  inputDocument - the input text as a String
  
  Returns:
  
  a List of ClassificationResult, one for each existing class
  
  Throws:
  
  IOException - if assigning probabilities fails
- reInitCache
  
  public void reInitCache(int minTermOccurrenceInCache, boolean justCachedTerms) throws IOException
  
  This function is building the frame of the cache. The cache is storing the word occurrences to the memory after those searched once. This cache can made 2-100x speedup in proper use, but can eat lot of memory. There is an option to lower the memory consume, if a word have really low occurrence in the index you could filter it out. The other parameter is switching between the term searching, if it true, just the terms in the skeleton will be searched, but if it false the terms whoes not in the cache will be searched out too (but not cached).
  
  Parameters:
  
  minTermOccurrenceInCache - Lower cache size with higher value.
  
  justCachedTerms - The switch for fully exclude low occurrence docs.
  
  Throws:
  
  IOException - If there is a low-level I/O error.

Class CachingNaiveBayesClassifier

Field Summary

Fields inherited from class org.apache.lucene.classification.SimpleNaiveBayesClassifier

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.classification.SimpleNaiveBayesClassifier

Methods inherited from class java.lang.Object

Constructor Details

CachingNaiveBayesClassifier

Method Details

assignClassNormalizedList

reInitCache