org.apache.lucene.classification.utils
Class DatasetSplitter
java.lang.Object
org.apache.lucene.classification.utils.DatasetSplitter
public class DatasetSplitter
- extends Object
Utility class for creating training / test / cross validation indexes from the original index.
Constructor Summary |
DatasetSplitter(double testRatio,
double crossValidationRatio)
Create a DatasetSplitter by giving test and cross validation IDXs sizes |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DatasetSplitter
public DatasetSplitter(double testRatio,
double crossValidationRatio)
- Create a
DatasetSplitter
by giving test and cross validation IDXs sizes
- Parameters:
testRatio
- the ratio of the original index to be used for the test IDX as a double
between 0.0 and 1.0crossValidationRatio
- the ratio of the original index to be used for the c.v. IDX as a double
between 0.0 and 1.0
split
public void split(AtomicReader originalIndex,
Directory trainingIndex,
Directory testIndex,
Directory crossValidationIndex,
Analyzer analyzer,
String... fieldNames)
throws IOException
- Split a given index into 3 indexes for training, test and cross validation tasks respectively
- Parameters:
originalIndex
- an AtomicReader
on the source indextrainingIndex
- a Directory
used to write the training indextestIndex
- a Directory
used to write the test indexcrossValidationIndex
- a Directory
used to write the cross validation indexanalyzer
- Analyzer
used to create the new docsfieldNames
- names of fields that need to be put in the new indexes or null
if all should be used
- Throws:
IOException
- if any writing operation fails on any of the indexes
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.