Joe Ellsworth, CTO, trading strategist and principal research scientist at Bayes Analytic, DBA

Saturday, November 22, 2014

OK here is an idea. What if I set up a web service where somebody could post a Training data set. They can remove the symbol name and change the bar dateTimes to a different year or different month as long as the relationships between the bars remains constant. They do need to be 1 minute bars in sequence to test this algorithm and I need about 60K real life bars for training.

I would build a test model for them and give them back a model id. This algorithm is predicting 19 minutes into the future but if I do not know which symbol or the exact time frames they were originally sourced from so it would be tough to cheat. I do want real market data as the market simulation data doesn't reliably produce the same results.

I run my internal splits with either 90% for training and 10% for testing or 80% for training and 20% for testing. I am currently working with 2014 for most of the testing but it has worked just as well for 2013 and 2012 where I have the 1 minute data. I don't have 1 minute data farther back to test.

They can then call the web service with the model-id, posting 18 bars to add data to the existing set and get a prediction for those 18 bars. Since I am predicting 20 minutes into the future I will return predictions for which bars will rise during that 20 minute period. It should be impossible for the software to look into the future. If they repeat this cycle sending the next 19 bars each time they can record my predictions and determine the precision and recall rates. My current 90/10 split has about 6500 rows so they would have to call the service 340 times to test a similar amount of data in a way that I could not possibly cheat. .

I could run it a mode which updates the model after each post which is closest to how we would use it for live trading or I can leave the model alone and just predict forward. One is more accurate while the other is faster.

Now the real question is why would other engineers do the work to help me test in this way? What benefit would they get? What benefit would they want? Will this kind of test help close the sale for the investors I want to attract? or will it be too technical for them to understand?

They could even strip the Bar DateTime off as long as the rows remain in the correct sequence in the CSV file. I would have to know to expect this or it would break my CSV parser. Seems like it should not violate their data contracts if they have removed the identifying symbol and dates from the data. I would promise to delete it after the test anyway.

I could even offer python, or Node.js code to read the CSV, make the posts and accumulate the results so all they have to provide is their CSV file. But then I could possibly cheat and post data through on a back channel (unless they audited my code).

3 comments on article "Best way for independent confirmation?"

Bharath Rao, Entrepreneur

Friday, November 21, 2014

Joe,

You are right. It's a very good idea to be suspicious of success rates like this. We are a research firm too. You can write to me at bharath@alphamatters.com and let me know your ideas about how we can collaborate.

I am mostly interested in ideas to allow fellow quant professionals to validate the predictive accuracy of others without sharing the underlying secret sauce. There has to be a way to set up test harnesses that would make it difficult for honest professionals to accidentally cheat. You have any ideas?

Archived Discussions

Best way for independent confirmation?

More links

3 comments on article "Best way for independent confirmation?"

Please login or register to post comments.

Newsletters

About our Association

Get in touch

Media

Follow Us