The challenge is over, but a new challenge is on-going using the same datasets, check it out!
The aim of the challenge in performance prediction is to find methods to predict how accuratly a given predictive model will perform on test data, on ALL five benchmark datasets. To facilitate entering results for all five datasets, all tasks are two-class classification problems. You can download the datasets from the table below:
Dataset | Size | Type | Features | Training Examples | Validation Examples | Test Examples |
---|---|---|---|---|---|---|
ADA | 0.6 MB | Dense | 48 | 4147 | 415 | 41471 |
GINA | 19.4 MB | Dense | 970 | 3153 | 315 | 31532 |
HIVA | 7.6 MB | Dense | 1617 | 3845 | 384 | 38449 |
NOVA | 2.3 MB | Sparse binary | 16969 | 1754 | 175 | 17537 |
SYLVA | 15.6 MB | Dense | 216 | 13086 | 1308 | 130858 |
At the start of the challenge, participants had only access to labeled training data and unlabeled validation and test data. The submissions were evaluated on validation data only. The validation labels have been made available (one month before the end of the challenge). *** DOWNLOAD THE VALIDATION SET LABELS *** . The final ranking will be based on test data results, to be revealed only when the challenge is over.
All the data sets are in the same format and include 5 files in ASCII format:
The matrix data formats used are (in all cases, each line represents a pattern):
If you are a Matlab user, you can download some sample code to read and check the data.