HOW

NUMERAI

WORKS

NUMERAI is a weekly data science competition for data scientists of any background.

Predictions submitted by users steer Numerai's hedge fund.

Rewards are paid in cryptocurrency.

What follows is an overview of how to get started submitting predictions and earning payouts.

TOURNAMENT
Structure

Numerai rounds start every week and are open for one week, during which submissions may be made. Staking Numeraire (Numerai's crypto-token) on submissions is only open during the first two days of the round.

Every week users: download the latest data, model it, upload their predictions, then stake on them.

Predictions are tested against the stock market for three weeks following the close of the round before payouts and results are published.

Numeraire staked on successful predictions is returned in addition to receiving an additional cryptocurrency reward. Stakes on unsuccesful predictions are destroyed. Only staked predictions control the capital in Numerai's hedge fund, so unstaked submissions are not rewarded.

Tournaments are open for one week (staking Numeraire is only open the first two days) and payouts occur three weeks after close

DATA

A data zip is published at the start of every weekly tournament. The download contains:

Training data: a CSV file containing features and their binary target values with which to train your machine learning models.

Tournament data: a CSV file containing features and meta data which you will use to generate your own binary target estimates to submit. This data is clean and regularized.

Example models: two example classifier scripts, Python and R, to get you started. They produce predictions that are ready for submission in a tournament. Try running one of the scripts and uploading the CSV it outputs.

Example predictions: an example of the format in which predictions should be uploaded.

Download Data

The current round's data

An example of training data

Numerai’s data is obfuscated. The feature columns contain information required to predict the target variable but Numerai does not reveal what the features mean or what the target variable means. The era column corresponds to unspecified periods of time. The challenge of Numerai’s data is not just to get strong performance predicting target variables but to have that performance apply consistently across eras.

There are four data types: the train data type is the training data which includes targets for your model to train on. The validation data type also contains targets to allow you to test your model locally. The tournament data type does not contain targets but it used internally by Numerai to validate your model for trading purposes. The live data type is the primary way your model is evaluated on Numerai. This data corresponds to new live data which no one has the targets for. All payouts on Numerai are based on performance on the live data.

MAKING PREDICTIONS

Try generating predictions with the example models before making your own. Predictions should be CSV files with two columns: id and probability. The probability column is the probability estimated by your model of the observation being of class 1.

Once your predictions have been uploaded they are scored and added to the round's public leaderboard. You may make as many submissions as you like until the close of the round, though each submission will replace the last.

Example predictions

SCORING

Submissions are scored on four metrics upon upload, immediately visible on the site.

Logarithmic loss/log loss is a measure of a model's accuracy. The logloss listed in an on-going tournament is a measure of the accuracy against the validation set. Once a round has resolved, a model's log loss against the live market data is listed. A good model is one whose live logloss beats the benchmark. Currently, this benchmark is 0.693.

Consistency measures the percentage of eras in which a model achieves a logloss better than the benchmark. Numerai wants models that work well consistently across eras. Only models with consistency above 58% are considered consistent.

Originality is a measure of whether a set of predictions is uncorrelated with predictions already submitted. Numerai wants to encourage new models over duplicate submissions.

Concordance is a measure of whether predictions on the validation set, test set, and live set appread to be generated by the same model. A data scientist who submits perfect answers on the validation set is unlikely to achieve concordance.

Staking Numeraire

Having data scientists stake Numeraire on predictions is the mechanism that allows Numerai to overcome overfitting. Read about the reason and mechanics of Numeraire in our whitepaper.

Staking is only open during the first two days of a round. After this Numerai begins trading the staked predictions while the modeled data is fresh.

Predictions made during the rest of the tournament are not used in Numerai's trading, but serve to let users test their skill and earn Numeraire, which is needed to stake. Read more about how to earn Numeraire below.

Once your submission has concordance and consistency of 58% or above, you may stake on it.

Staking requires a data scientist to choose the amount of Numeraire to stake, s, and accompanying confidence value, c. Confidence is defined as the amount of Numeraire a data scientist is willing to stake to win 1 USD. Thus, the amount of Numeraire staked divided by the confidence is the maximum USD a submission will earn.

Because staking is something that happens on the blockchain, it is irreversable. A new stake can, however, be made if it has the same or higher confidence and stake amount.

Once staked, a user may no longer upload new predictions in that round.

Scoring Stakes

Stakes are resolved and paid three weeks after the conclusion of a round. Each round currently has a staking prize pool of $6000 + 2000 NMR, which we aim to increase over time.

Data scientists are awarded s/c dollars in descending order of confidence (and by time staked if tied) until the prize pool is depleted. Once the prize pool is depleted, data scientists no longer earn rewards and their stakes are returned.

Stakes on submissions within the confidence window that have worse-than-benchmark live log loss are destroyed. All other stakes are returned.

Example of resolved staking outcomes

Numeraire is awarded in proportion to the USD amount. USD rewards are paid out in ether (ETH), Ethereum’s native crypto-token. This can be withdrawn to an exchange or wallet like Coinbase to be converted to an old-world currency of your choosing.

How to get Numeraire

Numerai awarded 1,000,000 Numeraire to its best data scientists when the token was launched in 2017 and continues to award Numeraire to performant staked models. If you need Numeraire to stake, there are a few ways to get it:

Exchanges: third-party crypto-token exchanges that trade Numeraire for other tokens like Bitcoin or Ether.
Reputation (coming soon): Numerai will award Numeraire to users who consistently upload good models or meet other measures of useful contribution, regardless of whether they have staked or not. Details will be announced soon.

Page updated 13 March 2018