Search
× Search
Tuesday, November 19, 2024

Archived Discussions

Recent member discussions

The Algorithmic Traders' Association prides itself on providing a forum for the publication and dissemination of its members' white papers, research, reflections, works in progress, and other contributions. Please Note that archive searches and some of our members' publications are reserved for members only, so please log in or sign up to gain the most from our members' contributions.

How to guard against market data errors?

photo

 Rob Terpilowski, Software Architect

 Friday, December 19, 2014

I have an automated trading strategy that is utilizing Interactive Brokers and is monitoring and executing trades in 2 separate IB accounts. The strategy uses volume as one of its inputs and places trades if a minimum volume threshold is met. This week the strategy began executing trades in one account, but not the other. Upon closer inspection I found that IB is reporting volume data incorrectly in one of the accounts, specifically it was reporting volume that was significantly higher than actual volume, causing the volume threshold to be reached, when in reality it wasn't. IB is currently debugging why the account is getting incorrect data. Since trades were executed based on this erroneous data, my question is how can one possibly defend from this type of error? I realize a sanity check on the input data is possible, but the volume numbers, while erroneous, were still realistic, so the data in theory would have still passed such a check.


Print

20 comments on article "How to guard against market data errors?"

photo

 James Goode, Consultant Programmer

 Saturday, December 20, 2014



How extensive is this error? Is it just one erroneous value, or is there a series of incorrect values reported over a period of say 10 minutes or more?

I assume you confirmed the incorrect data from independent sources.

How time critical is the data? Is it possible to pause for a second or two to check data from alternate sources? Or would that damage the strategy performance? Last time I used IB it seemed that the fastest data was sub-second, but not much faster (1/3 of a second).


photo

 Alex Krishtop, trader, researcher, consultant in forex and futures

 Saturday, December 20, 2014



First off, what do you use to access their market data? TWS, gateway, fix api, whatever? What is the architecture of your programmatic solution? Why do you need to connect to two data sources from the same vendor?

What's the market? FX? What are trading sizes in both accounts then? FX liquidity and volume at IB may be seriously different for different accounts depending on a number of factors.

What are the account types — individual, advisor, institutional? Kind of data subscription?

Unfortunately there's just too little input data to advise anything meaningful, if you elaborated on these points I'd be happy to help if I could.


photo

 Rob Terpilowski, Software Architect

 Sunday, December 21, 2014



The software trades the US equity markets, in 2 separate individual accounts, with 2 instances of the IB Gateway running on my desktop. Each gateway is talking its own instance of my automated trading application, and so each app is subscribing to IB's market data, hence the reason for the 2 market data subscriptions.

The application will buy at the close if the volume for a security is above a certain threshold, so speed is not necessarily critical, and I could in theory try to verify the volume data from a 2nd source if I had to.

What I've been seeing is that throughout the day the volume numbers that the application reports from the 2 accounts start to diverge as the day goes on, until the end of the day when one account will show volume that is about 25% or so higher than the other account.

I was curious if this was potentially an issue with the accounts' data feeds, or with the IB Gateway itself, so yesterday I fired up 2 instance of TWS, one for each account, and pulled up quotes for QQQ. Sure enough, each instance was reporting different volume numbers, with one of the accounts showing about 20% higher volume.

I was able to chat with someone from tech support, and learned that IB has 2 methodologies they use to report volume, "Native" and "Calculated" defined below:

Native volume - Does not update with every tick, but will include delayed transactions, busts, late-reported trades and combos.

Calculated volume - Updates with every tick, but may not include delayed transactions, busts, late-reported trades and combos.

In TWS you can toggle the volume column to use Native or Calculated, but in the Gateway there is no such mechanism to do this, and the IB support analyst I was chatting with was at a loss as to why both gateway instances weren't reporting the same volume, and have escalated the issue up to their developers. The problem first surfaced about a week ago. I haven't made any changes in years to my application, and my Gateway/TWS version hasn't been updated for about 4 months, so it appears to be an issue on IB's server side, but we'll see what they're response if after they've had a chance to research this.


photo

 Gary E., Owner, Erdman Computer Consulting, Inc. and Investment Management Consultant

 Sunday, December 21, 2014



I get my 5-minute quote data from TD Ameritrade (I used to get data from PC Quote or MarketSmart - that data was real bad but it made me write the logic needed to make sure I only have "clean" data). Volume received is always cumulative volume for the day.

I only use volume to determine if a stock is currently in what I call an "extreme" trading mode.

If I receive a quote with the same volume as a previous quote received the same day, then I know the quote is bad and I reject the quote. This may be what's happening with you. This particular problem has been around the industry for at least the last 10 years. The source of this problem is with the data source itself and everyone uses the same data source (Comstock?). These "bad" volume records will repeat throughout the day periodically and then that will be it for several weeks/months for that symbol.

If a quote comes in with zero volume it is rejected.

If volume is less than the previous quote's volume, then the quote is rejected.


photo

 Colin "Soup" Campbell, Trader at IFundTraders

 Sunday, December 21, 2014



There is only one way to be sure. Multiple data sources. Multiple brokers may have the same data source with errors, so you have to have multiple independent data sources and a voting system. Exact reporting is unlikely, so you need a comparison with a tolerance built into a voting algo. Not cheap, so you either need it, or you don't.


photo

 Alex Krishtop, trader, researcher, consultant in forex and futures

 Monday, December 22, 2014



Rob, if volume data is so critical then I believe that you want to go for direct data feed from the exchange, as no retail vendor will deliver you complete information realtime (suppose many of them can't deliver it even in historical data). As to the particular issue with IB — are these 2 accounts have the same subscription to market data? And maybe a stupid question: what are the settings in each instance of IB gateway, especially those marked in this screen: http://edgesense.net/showcase/ib-gateway-settings.png ?


photo

 Bill Zhimin Yang, Quantitative Researcher at Taikang Asset Management

 Monday, December 22, 2014



not an advertisement but I read on Caltech's website that recommended Quantquote for data. I was going to buy some historical data there but haven't yet, so it's just a reference not from my own experiences(Disclaimer).


photo

 Robert Simons, Associate & Senior International Market Strategist

 Monday, December 22, 2014



As Colin said and I think we all agree on this "Multiple sources" and therefore "Multiple data". Which is precisely what I've been using and trading with for the last 15 years for the same reasons as you've described in your article and then some. It doesn't fix the issue at hand but at least am not just another dumb ass staring at a screen taking the data for granted.


photo

 Rob Terpilowski, Software Architect

 Monday, December 22, 2014



Alex, I've verified that both accounts have the same market data subscriptions. I'm running the gateways on 2 separate API sockets and both are set to a 30 second timeout with no Master API client ID set.

Thanks for the link Bill, I'll likely be in the market for historical intraday data, so this may work out well.

Robert/Colin, what do you do when the difference in your data sources exceeds your tolerance?


photo

 Alex Krishtop, trader, researcher, consultant in forex and futures

 Monday, December 22, 2014



Rob, have you tried to run them on different computers? Is only one particular account reports erroneous volume information or does it happen randomly from one run to another?


photo

 Rob Terpilowski, Software Architect

 Monday, December 22, 2014



Alex,

Tried it on another machine, but still no joy. The one account consistently reports the "native" volume which is considerably higher than the "Calculated" volume.

For example today QQQ:

IB native volume: 42.1M

IB calculated volume: 34.1M

Still haven't received any word from the IB devs regarding what the issue may be.


photo

 Jonathan Kinlay, Quantitative Research and Trading | Leading Expert in Quantitative Algorithmic Trading Strategies

 Tuesday, December 23, 2014



I am not at all surprised by this. IB's market data feed is notoriously unreliable, with very poor granularity. The obvious solution is to get a better data feed. Amongst retail platforms, I have found Tradestation's market data to be more consistent and higher quality than IB's, for example. If you want to stick to IB, could you simply set up the two accounts as sub-accounts and allocate trades between them, rather than treating them as separate, independent accounts?


photo

 Robert Carver, Proprietary systematic trader, writer and freelance researcher.

 Tuesday, December 23, 2014



I don't use volume data except to decide when to roll futures contracts, and I use the same basic checks as Gary. I do however use prices and it's true that IB feed of these can sometimes be flaky, for no particular reason.

Having said that the error count isn't much worse than I saw using 'professional' data eg BB, reuters feeds, though it is definitely a bit worse. What IB is particularly bad at is being able to explain and fix problems; their SLA isn't as high as when you are paying the likes of BB several million bucks a year for data. Not the same issue, but for example sometimes the wrong contract expiry comes back for a fill, which means I get a break. I've pointed out the problem several times but it still happens.

I take a very fatalistic attitude to this - data will sometimes be bad and its better to handle bad data robustly than to try and find perfect data.

I don't believe multiple sources is the answer because (a) to automate collection from multiple sources you need at least 3 to resolve disagreements, (b) you will sometimes see the same error in multiple sources because the original data from the exchange is corrupted, (c) the extra cost and complexity doesn't justify the benefits.

If speed is not hugely important there is a lot to be said for the approach of filter and fallback to manual, which is what I use for prices. First check for zero and negative prices, which IB is fond of producing. A check to see if the move is more than x sigma more than the usual will catch whatever is left. Yes you will be a bit slow reacting to genuinely large moves, like a flash crash, but you might not have wanted to trade on those anyway.

Finally building up the rest of our systems so that one bad price doesn't kill it, eg by using median smooths, is worth doing. All this will be difficult in a high frequency enviroment, but as the OP said that isn't what we're dealing with here.


photo

 Jonathan Kinlay, Quantitative Research and Trading | Leading Expert in Quantitative Algorithmic Trading Strategies

 Tuesday, December 23, 2014



IB's MDF is a lot worse than that. If you look at their intraday data you will see that they were significantly off the NBBO in many of the top names for most of 2012. (Don't know why this didn't turn into a major issue for them). They also send their data in 250 millisec data packets, rather than tick by tick, which can lead to all kinds of inconsistencies and false signals. If you are doing anything that is at all latency sensitive I strongly advise you to avoid using their data feed.

There are lots of good things I can say about the IB platform, including their reasonable (for retail) execution algos and commission rates. They also have a managed account system which allows you to allocate trades from one master account across several sub-accounts. That way you can avoid discrepancies in execution between the accounts. I think there is a limit of 15 sub-accounts.


photo

 James Goode, Consultant Programmer

 Tuesday, December 23, 2014



@Rob As you are using the API in both program instances, you could put some code in to cross check the volume numbers, and to choose the 'best / preferred value' before trades. Given your comment on the divergence starting to diverge from the commencement of trading this could be one approach if you wish to stay with IB.

With this you could also give yourself a warning if IB were to alter their algorithms to correct the discrepancy(as a result of your complaint). Data providers have a habit of altering their algorithms which does affect trades relying on their original algorithm, and they don't always warn traders.


photo

 Colin "Soup" Campbell, Trader at IFundTraders

 Tuesday, December 23, 2014



@Rob - when you cannot determine a valid data point ie. they don't agree within your limits, There is no other choice than to mark all of the data points invalid. Because volume is cumulative and time sensitive, many of your individual data points self correct for differences in sample time. A little thought in designing your voting system could reduce your false invalid labels.


photo

 Colin "Soup" Campbell, Trader at IFundTraders

 Tuesday, December 23, 2014



@Rob - don't forget different routes means different equipment, and different equipment means different arrival times because of buffering requirements of the data streams. If you try to tighten your limits too tight, you might be reporting the buffering of the internet.


photo

 Rob Terpilowski, Software Architect

 Friday, December 26, 2014



Jonathan, I'll take a look at the managed account structure and see what (if any) additional effort would be required in order to get my app executing trades in that environment.

@Robert, for this particular strategy and the way it trades, the filter and fallback to manual mode would probably be the best way to go for dealing with discrepancies when they may arise.


photo

 Jonathan Kinlay, Quantitative Research and Trading | Leading Expert in Quantitative Algorithmic Trading Strategies

 Saturday, December 27, 2014



See this on IB linked accounts: https://www.interactivebrokers.com/en/?f=%2Fen%2Fsoftware%2Fpdfhighlights%2FPDF-LinkedAccounts.php%3Fib_entity%3Dllc


photo

 John Devron, Computer Software Professional

 Sunday, December 28, 2014



Hi Rob,

I solved my datafeed quality problems by comparing feeds from two different vendors.

I found that no filter will be without flaw because the range of possible valid values and patterns is so diverse. Redundant data feeds worked perfectly for me, without flaw.

For the second feed I just used an account from a different broker.

John

Please login or register to post comments.

TRADING FUTURES AND OPTIONS INVOLVES SUBSTANTIAL RISK OF LOSS AND IS NOT SUITABLE FOR ALL INVESTORS
Terms Of UsePrivacy StatementCopyright 2018 Algorithmic Traders Association