Class DatasetSplitter
java.lang.Object
org.apache.lucene.classification.utils.DatasetSplitter
Utility class for creating training / test / cross validation indexes from the original index.
-
Constructor Summary
ConstructorDescriptionDatasetSplitter
(double testRatio, double crossValidationRatio) Create aDatasetSplitter
by giving test and cross validation IDXs sizes -
Method Summary
Modifier and TypeMethodDescriptionvoid
split
(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, String classFieldName, String... fieldNames) Split a given index into 3 indexes for training, test and cross validation tasks respectively
-
Constructor Details
-
DatasetSplitter
public DatasetSplitter(double testRatio, double crossValidationRatio) Create aDatasetSplitter
by giving test and cross validation IDXs sizes- Parameters:
testRatio
- the ratio of the original index to be used for the test IDX as adouble
between 0.0 and 1.0crossValidationRatio
- the ratio of the original index to be used for the c.v. IDX as adouble
between 0.0 and 1.0
-
-
Method Details
-
split
public void split(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, String classFieldName, String... fieldNames) throws IOException Split a given index into 3 indexes for training, test and cross validation tasks respectively- Parameters:
originalIndex
- anLeafReader
on the source indextrainingIndex
- aDirectory
used to write the training indextestIndex
- aDirectory
used to write the test indexcrossValidationIndex
- aDirectory
used to write the cross validation indexanalyzer
-Analyzer
used to create the new docstermVectors
-true
if term vectors should be keptclassFieldName
- name of the field used as the label for classification; this must be indexed with sorted doc valuesfieldNames
- names of fields that need to be put in the new indexes ornull
if all should be used- Throws:
IOException
- if any writing operation fails on any of the indexes
-