org.apache.lucene.classification.utils
Class DatasetSplitter

java.lang.Object
  extended by org.apache.lucene.classification.utils.DatasetSplitter

public class DatasetSplitter
extends Object

Utility class for creating training / test / cross validation indexes from the original index.


Constructor Summary
DatasetSplitter(double testRatio, double crossValidationRatio)
          Create a DatasetSplitter by giving test and cross validation IDXs sizes
 
Method Summary
 void split(AtomicReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, String... fieldNames)
          Split a given index into 3 indexes for training, test and cross validation tasks respectively
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DatasetSplitter

public DatasetSplitter(double testRatio,
                       double crossValidationRatio)
Create a DatasetSplitter by giving test and cross validation IDXs sizes

Parameters:
testRatio - the ratio of the original index to be used for the test IDX as a double between 0.0 and 1.0
crossValidationRatio - the ratio of the original index to be used for the c.v. IDX as a double between 0.0 and 1.0
Method Detail

split

public void split(AtomicReader originalIndex,
                  Directory trainingIndex,
                  Directory testIndex,
                  Directory crossValidationIndex,
                  Analyzer analyzer,
                  String... fieldNames)
           throws IOException
Split a given index into 3 indexes for training, test and cross validation tasks respectively

Parameters:
originalIndex - an AtomicReader on the source index
trainingIndex - a Directory used to write the training index
testIndex - a Directory used to write the test index
crossValidationIndex - a Directory used to write the cross validation index
analyzer - Analyzer used to create the new docs
fieldNames - names of fields that need to be put in the new indexes or null if all should be used
Throws:
IOException - if any writing operation fails on any of the indexes


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.