WCCI Performance Prediction Challenge

The challenge is over, but a new challenge is on-going using the same datasets, check it out!

The Challenge

The aim of the challenge in performance prediction is to find methods to predict how accuratly a given predictive model will perform on test data, on ALL five benchmark datasets. To facilitate entering results for all five datasets, all tasks are two-class classification problems. You can download the datasets from the table below:

Dataset Size Type Features Training Examples Validation Examples Test Examples
ADA 0.6 MB Dense 48 4147 415 41471
GINA 19.4 MB Dense 970 3153 315 31532
HIVA 7.6 MB Dense 1617 3845 384 38449
NOVA 2.3 MB Sparse binary 16969 1754 175 17537
SYLVA 15.6 MB Dense 216 13086 1308 130858

At the start of the challenge, participants had only access to labeled training data and unlabeled validation and test data. The submissions were evaluated on validation data only. The validation labels have been made available (one month before the end of the challenge). *** DOWNLOAD THE VALIDATION SET LABELS *** . The final ranking will be based on test data results, to be revealed only when the challenge is over.

Dataset Formats

All the data sets are in the same format and include 5 files in ASCII format:

The matrix data formats used are (in all cases, each line represents a pattern):

If you are a Matlab user, you can download some sample code to read and check the data.