Search
× Search
Tuesday, November 19, 2024

Archived Discussions

Recent member discussions

The Algorithmic Traders' Association prides itself on providing a forum for the publication and dissemination of its members' white papers, research, reflections, works in progress, and other contributions. Please Note that archive searches and some of our members' publications are reserved for members only, so please log in or sign up to gain the most from our members' contributions.

Ways to mitigate curvefitting in trading algorithm development

photo

 Yoshiharu "Josh" Sato, C++ Algo Quant Developer

 Friday, October 31, 2014

Dear All, Over the past few years I've been developing FX trading algorithms using my custom backtesting system (https://sites.google.com/site/yoshi2233/backtester) and am now looking for ways to mitigate (if not remove) curvefiting in backtesting. Other than using distinct in-sample / out-of-sample data separation and optimizing algorithms based on walk-forward analysis, what methodologies can I use for the goal? Thanks, Josh


Print

8 comments on article "Ways to mitigate curvefitting in trading algorithm development"

photo

 Jim Witkam, Owner Altreva, agent-based forecasting models

 Thursday, November 6, 2014



If I may still add something to the discussion: In my view, to avoid overfitting entirely, one should not optimize an algorithm at all on in-sample data. Instead a system should "evolve" going through time, processing in-sample data the same way as out-of-sample data and all future data for that matter (so in-sample and out-of-sample become irrelevant concepts).


photo

 Stefan Simik, Quant / Trading Systems Developer

 Friday, November 7, 2014



1. General idea

Big risk for overfitting is if there are too many core variables, that influence the whole logic of the trading strategy. This is often related with lack of knowledge/details of how the strategy should work and trying to find out any combination, that works best. This is really risky approach, because one can end up with results, that he does not understand, there is not much logical control - there is just some great statistical result.

Much more safe approach is to deeply understand the trading strategy and settings that should work. Then you can just tweak the values, that are more optimal than others.

You need to know the ratio behind each parameter of the strategy and its value and how/why that influences the result.

2. How to check we are not overfitting

a) use such a settings for a strategy, that produces more trades (it will have problably lower overall profit-factor/efficiency) but the strategy should be still profitable enough.

b) test the strategy longer period of time, to prevent overfitting to some specific period

and look, if the overall results did not suffer a big drawdown in that older history.

(Generally, I recommend to know behaviour of intraday strategy for the last 6-8 years)


photo

 Peter McNaboe, Financial Professional: Derivatives,Risk & Wealth Management,Trading Technologies; Seeking Opportunity in Related Fields

 Friday, November 7, 2014



Vol shock protections used to be an important part of automating market making in the software I have used in the past. When you have your own theoretical curves generated using prop software responsible for making markets, it is important to have certain assumptions in place, for example, you will have perhaps 4 zones, the first zone specifies the extent to which at-the-money volatility, or volatility on a strike by strike basis changes, then you would kill all bids and offers, and each zone will have a larger range of vol differentials, so zone 4 would be the riskiest, therefore not killing all your bids and offers until volatility changes by the highest range, that you would pre-specify.


photo

 Stefan Simik, Quant / Trading Systems Developer

 Sunday, November 9, 2014



One other idea came to my mind related to overfitting backtest results:

When using an optimization and trying to find out best combination of some input variables

that produce the best backtest results, it is important:

1) to check, if many similar setting (plus minus, in some rational interval) produces similar backtest results. If not - our input parameters are clearly over-optimized and there is probability, they will not work in the future as expected.

2) When choosing right parameters from wide range of best values, always choose that one that includes WIDEST range of acceptable values of that parameter.

Said from other side: if you choose the most strict/limited value of the input parameter (I mean from the interval of the 30% best working values), that produces the best results, there is significant risk, that real results will underpeform the backtest in the future or will be diffferent than expected.

Markets behaviour is evolving changing.

It can depend on type of strategy and idea behind it, but generally said - if the strategy is prepared for wider range of conditions, it will be more stable in the future.


photo

 James Hirschorn, Ph.D., Owner, Quantitative Analyst and Developer at Quantitative Technologies

 Wednesday, November 12, 2014



@Josh: Have you looked at the book An Introduction to Statistical Learning? (It's available at http://www-bcf.usc.edu/~gareth/ISL/) One of the main themes in statistical learning seems to be finding the balance between bias and variance (too much bias = underfitting, too much variance = overfitting). I haven't finished reading it myself, but I suspect the methods from statistical learning are relevant to you.



@Jim: I also know of a mathematician/trader who does not believe in optimization and is supposedly very profitable. I'm not sure exactly what he meant, since obviously in general optimization is of great importance in statistics, but I assume he was referring to quantitative trading in general.


photo

 Graeme Smith, Investment Manager at The Tourists Portfolio

 Saturday, December 27, 2014



James. Great book. I'd recommend it to anyone and everyone. It's data science/statistically oriented, and based on the language R, but I'd rate it as a must-read resource.


photo

 Valerii Salov, Director, Quant Risk Management at CME Group

 Saturday, December 27, 2014



I have found this discussion very similar to another one, where I have added the comment related to this thread as well. Here a copy is

Ken Duke: "How do you prevent curve fitting?"

The following techniques are applied to prevent over fitting: Bayesian optimization, regularization, cross-validation, early stopping, pruning.

Their penetration to trading and developing trading systems and investigation of their applicability is going on with an acceleration. At the same time markets is a rich source of information for developing these and new methods.

Best Regards,

Valerii


photo

 Pablo Torre, Data Solutions Manager @FractalSoft Data Analysis

 Saturday, December 27, 2014



Overfitting happens when your model fits accidental regularities that exist in the data.

(this is true of any dataset...) When you fit your model, there's no way to tell which regularities are real and which are accidental so it fits both types of regularities.

Fitting any model is about finding a balance between a model that is not powerful enough to fit the data (high-bias/underfit) and one that is too powerfull (high-variance/overfit)

Some ideas on how to prevent overfitting from Geoffrey Hinton's MOOC (lecture9):

1. Get more data. If possible, this is often the easiest way to reduce overfitting. Since having the same model that is too powerful for 10,000 data points would be relatively less powerful when fitting 20,000 data points... This is not always possible though, since computing power (training time) and getting the data can become bottlenecks.

2. use a model that has the right capacity. A less powerful model should fit the real regularities better, assuming that the accidental regularities are weaker. Some ways to reduce a model's capacity are early stopping, weight decays, and adding noise to the learning process... cross validation is a way to measure how well the model is fitting, specially for models like NN's that grow stronger with each epoch of training...

3. use Bagging. Training the model on different subsets of the data and then averaging the results. Random forests work this way. Or training different types of models and then averaging the results.

4. Bayesian. Learn many different weight vectors and then average them.

Please login or register to post comments.

TRADING FUTURES AND OPTIONS INVOLVES SUBSTANTIAL RISK OF LOSS AND IS NOT SUITABLE FOR ALL INVESTORS
Terms Of UsePrivacy StatementCopyright 2018 Algorithmic Traders Association