DatasetSplitter (Lucene 4.2.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.classification.utils
Class DatasetSplitter

java.lang.Object
  org.apache.lucene.classification.utils.DatasetSplitter

public class DatasetSplitter
extends Object
extends Object

Utility class for creating training / test / cross validation indexes from the original index.

Constructor Summary
`DatasetSplitter(double testRatio, double crossValidationRatio)` Create a `DatasetSplitter` by giving test and cross validation IDXs sizes

Method Summary
`void`	`split(AtomicReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, String... fieldNames)` Split a given index into 3 indexes for training, test and cross validation tasks respectively

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

DatasetSplitter

public DatasetSplitter(double testRatio,
                       double crossValidationRatio)

Create a DatasetSplitter by giving test and cross validation IDXs sizes

Parameters:: testRatio - the ratio of the original index to be used for the test IDX as a double between 0.0 and 1.0; crossValidationRatio - the ratio of the original index to be used for the c.v. IDX as a double between 0.0 and 1.0

Method Detail

split

public void split(AtomicReader originalIndex,
                  Directory trainingIndex,
                  Directory testIndex,
                  Directory crossValidationIndex,
                  Analyzer analyzer,
                  String... fieldNames)
           throws IOException

Split a given index into 3 indexes for training, test and cross validation tasks respectively

Parameters:: originalIndex - an AtomicReader on the source index; trainingIndex - a Directory used to write the training index; testIndex - a Directory used to write the test index; crossValidationIndex - a Directory used to write the cross validation index; analyzer - Analyzer used to create the new docs; fieldNames - names of fields that need to be put in the new indexes or null if all should be used
Throws:: IOException - if any writing operation fails on any of the indexes