Want to be part of a leading quantitative research and technology company? Bring your skills and experience to G-Research by applying for one of our many roles.
Can you predict the future?
G-Research has launched what we hope will be an interesting and fun forecasting task to provide you with a flavour of one of the types of problem we work on. The best submission will win a prize of $30,000. Good luck!
We’d like you to predict y (an element of the return of a financial data series) from a number of features we’ll supply you with. You can download a training data set from our website (train.csv.zip) with y, weights and the features we’d like you to investigate.
Your task is to provide the missing response (y) for the test set (test.csv.zip).
Your submission will be evaluated according to the weighted mean squared error (wMSE). The wMSE is calculated using the following formula
where w is the weight for each row, y-hat is your prediction and y is the actual value for that point. Please submit a prediction that we can parse to a double for every row.
For each submission, you will receive a public score. This score is calculated using a portion of the test set, and is intended for informational purposes only. You may then choose one of these as your final submission. At the end of the competition, your final submission will receive a private score which is calculated using the remainder of the test set. The model with the lowest private wMSE that obeys the rules below, will be declared the winner.
We’d like this to be as pure and fair a data science contest as possible, so we require that participants:
- Base submissions only on the data provided, not on any additional data sources.
- Submit only one final entry. Any attempts to make multiple final entries (such as under different user names) will result in disqualification.
- Are not employed in a quantitative finance position with access to financial tick data
- Submit a maximum of 3 models per day for test evaluation. Participants are allowed an additional 10 bonus submissions at the beginning of the contest to help get started.
- Make their submissions and select their final entry by the closing date of 23:59 UTC on 15th April 2018
To verify rule 1, we require any potential prize winners to submit the code for their model, and be prepared to justify, by e-mail, any sections we think may suggest breaches. You will remain the owner of the IP in your code. The terms and conditions must also be satisfied.
Advice and resources
Building a good model will require the selection and combining of these features. You may also wish to explore transformations and interactions. The columns are:
- Index: A unique value that labels each row in the data set
- Day: The day of the year
- Market: A label that indicates which exchange the instrument is traded on
- Stock: A unique label for each instrument
- x0...x3E: Predictors that relate to the observed behaviour of the instrument on the day in question. The features labelled 'x3A', 'x3B', etc. are strongly related and might be expected to be highly correlated.
- x4...x6: Predictors that describe the ‘typical’ behaviour that we would expect of the instrument on that day.
- Weight: an importance weight we have generated, used in the wMSE scoring function. Like y, this isn't present in the test data.
We’ve provided an example script in Python that takes you through building and assessing a basic model. You’re free to use any language you like, but Python has a large number of libraries that you may find helpful.