Search
× Search
Sunday, December 22, 2024

Archived Discussions

Recent member discussions

The Algorithmic Traders' Association prides itself on providing a forum for the publication and dissemination of its members' white papers, research, reflections, works in progress, and other contributions. Please Note that archive searches and some of our members' publications are reserved for members only, so please log in or sign up to gain the most from our members' contributions.

Have Algorithm . . . Have Requirements . . . Need Glue!

photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Saturday, September 26, 2015

It's taken me years to perfect an algorithm that needs no optimization and doesn't degrade over time. We've back tested the algorithm using over 15 years of historic data. We've forward tested the algorithm in a live retail environment for nearly 18 months. We converted the algorithm from a retail language to a more commercial language. We've set up our servers to be co-located at the exchange. We've linked the broker/server/FIX with no issues. We require data that is pure, clean and 10 levels deep for bid and ask. The days we were plugged in completely we ran over 95% trade accuracy. Issue - Our data provider's data changed and went from clean to crappy. We are now looking for another provider. I would also like some feedback as to why some of you think my solutions provider is not more excited about being proactive with an algorithm proven to exceed 93% trade accuracy and an over 85% daily win rate (17+ winning days a month). Any suggestions would be greatly appreciated.


Print

57 comments on article "Have Algorithm . . . Have Requirements . . . Need Glue!"

photo

 Mark Brown mark@markbrown.com, Global Quantitative Financial Research, International Institutional Trading, Algorithmic Modeling.

 Saturday, September 26, 2015



15 years of data is nothing if the system produced 20 trades on monthly bars. however 15 years of 1 minute data where a system produced 5000 trades and was 95% accurate would be something. so there is much more information lacking to arrive at any opinions. m


photo

 Borut Skok, Market Analyst

 Saturday, September 26, 2015



Nowadays 15 years of historic data is useless. For trading Gold I use 4000 bars (5 min time frame; 14 days) and it is enough.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Saturday, September 26, 2015



@Mark - Where did you get 20 trades on monthly bars? I never mentioned the number of trades it takes but last week it took 28 trades and 27 of them were profitable. The week before that it took 36 trades and 35 of them were profitable. Average net profit was over $25 per trade per contract and is scalable to over 200 contracts in the current trading environment. That is one trading environment and I can duplicate those results in over 200 more trading environments. Does this help clarify why I'm asking the questions?


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Saturday, September 26, 2015



@Borut - 15 years is fine if you are talking about between 3600 to 6000 bars, which I am.


photo

 Borut Skok, Market Analyst

 Saturday, September 26, 2015



@William: The problem is in equalityness of this bars. Are they really representing the same process?


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Saturday, September 26, 2015



@Borut - Yes sir!


photo

 Noga Confino, Advisor to the board at Consenz

 Sunday, September 27, 2015



I think there is a lot of emphasis on the back testing but it is the forward testing / live testing which is equally, if not more important. You did not mention the results you have achieved in the 18 months of retail testing (including the account size you traded). At times, a good system can fail to perform due to all the execution issues, so the theory may not perform the same in practice.

Both these will be important in trying to understand why your solution provider is not reacting as you expect.

Separately, the sentence "needs no optimization and doesn't degrade over time" is hard to establish until you have gone live and lived through a few market disasters... So that remains to be proven..


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Sunday, September 27, 2015



You can apply an algorithm to historic data as if it were running in real time. We have done this. My algorithm is reactionary not predictive so market disasters mean nothing to it. We have tested it through all problematic areas over the last 15 years and it sailed through with flying colors. When the market dropped 1000 points a few weeks ago we made money all of the way down and all of the way back up.

I appreciate the comments from all of you but I am really looking for a solutions provider to step in here to offer some solutions. Maybe there aren't any represented here.


photo

 Jonathan Kinlay, Quantitative Research and Trading | Leading Expert in Quantitative Algorithmic Trading Strategies

 Monday, September 28, 2015



William, you haven't mentioned whether we are discussing equities, futures or F/X, which is obviously a factor. In my experience data consistency is as important as data quality. For example, it is possible to design systems that work using IB's data feed - even though it is very poor, it is at least consistent. You will probably need to re-tool your algorithm to work with a different market data provider, even if the data quality is good.

Which market data providers have you considered so far?


photo

 Alex Krishtop, trader, researcher, consultant in forex and futures

 Monday, September 28, 2015



William, if you don't mind let's start with a number of definitions for several terms you used in your post as self-evident but which are not, at least for most of us.

1. "Retail language"

1. "Commercial language" (and please elaborate on the relationship between languages and quality of an algorithm/trading idea)

1. "Pure"

1. "Clean" (both related to data)

1. "Trade accuracy"

1. "Clean" vs. "crappy" (again, related to data)

1. "Solutions provider" — what does he do?

In my opinion you did quite a strange thing — you tested it live in one environment (which you referred to as "retail"), then moved to another environment and it stopped working. I am not quite sure what you did it for, so your elaboration on the subject will definitely be helpful as well.

In any case if your algo requires 10 levels of bid and ask in "clean" and "pure" form and you say that you tested it over 18 years worth of data, do you mean that you've managed to purchase 18 years worth of level 2 data and that you actually believe that this data is indeed "clean" and "pure"?

And let me reiterate Jonathan's remark that any information about the market in question is essential for any further discussion.


photo

 Jose Antonio Lopez Portillo, Independent Trader

 Monday, September 28, 2015



try this guy: p.zielbauer@infinityfutures.com his name is Patrick

hope it helps


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



Hi Alex,

>Retail Language - A language other than C#, C+, C++ or Java that is used in retail software packages. In my case, Easy Language. We converted our algo from Easy Language to C# to make it more compatible with more complex direct market applications and the providers that use them.

>Commercial Languages - See above

>Pure - Feeds that include every individual tick and transaction as it occurs.

>Clean - CME MDP 3 changes will broadcast the ticks bundled supposedly to make them more efficient to broadcast. Data scientists say this is bull and that they are doing this simply because they are too lazy to clean up the process. Some data feeds as well include "ghost" trades at the beginning and end of sessions as place holders. These too are unnecessary. We have developed a way to compare data very efficiently and you would be surprised at the clutter in some of the feeds. If you don't need deep information (many levels of bid & ask data) DTN IQ is literally the best data for the price. Sorry to say they do not off more than depth of market information.

>Trade accuracy - Comparing the number of winning trades to the number of loosing trades on a daily, weekly, monthly and yearly basis.

>Clean vs. crappy - see Clean above

>Solutions provider - Company that offers a full service solution to complete implementation to direct market automated trading executions. Offering full and complete interfaces between (many) brokers, trading platforms, co-located server environments, front & back of house reporting solutions, a variety of vetted data providers that offer a variety of market depth information and finally access to every viable exchange worldwide. I'm currently working with one but have already kicked two others aside for their lack of being able to offer complete solutions.

The algorithm was tested in a "retail" environment because that is where we began. That particular "retail" product doesn't offer "bid/ask" tracking (10 levels deep) of what actually is happening in live markets. It only offers "last price traded" tracking which isn't as precise or helpful to complex algorithms such as ours. Moving it to a more precise environment didn't make it stop working. To the contrary, it proved that it worked perfectly. Even better than in the retail environment. What halted the accuracy was the change in the data to make it less pure.

I didn't say 18 years, I said 15 years of historic data and over 18 months of live trading. That initial testing was done in the retail environment because initially, that is all we had. We had to test the algorithm in a more precise live environment to insure what we assumed would work, would actually work. This is why we lastly tested it in a live market for over 18 months and using a pure data stream giving us 10 levels deep of market data.

If anyone thinks that randomly bundling the data to make it more efficient in its transport helps in the analysis of that data, they are delusional. We've compared data streams from 2 separate high end providers offering 10 levels deep of market data and found over 2000 extra records each day over the actual data as it occurs in just one highly traded live market analyzed after they began to bundle their data. The result is the exact opposite of what bundling was suppose to provide but if you don't do the research you would never know that.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



@Jonathan - The particular version of my algorithm is working in the Futures market but I have another version of the same algorithm that works in equities.

FX is a completely different animal. I'm currently working with a company that is building a database of every interbank feed (combined) with every trade (except hidden trading) taken in the pairs. Until we have a few years of that data, then and only then can we create an algorithm that is as accurate as what we have built currently.

On the subject of "poor data" I have to emphatically disagree with your premise that one must to conform to the quality of the data you feed your algorithm. I'll make a perfect comparison for you. I go out and buy a Bugatti Grand Sport Vitesse because I require every bit of the performance it promises. It requires premium no ethanol gas but the only gas in my area is regular or ethanol enhanced do I stick poor quality gas in it and destroy my million dollar investment, go by a low performance cookie cutter car or find a way to import the high quality gas my high performance car requires to my area? I say the answer is simple if I must have the performance of the Bugatti.

Now the other end of the coin is your suggestion to retool the algorithm to work with the poor data. Yes that is an option but not one I will ever choose as a primary option. Broadcast TV signals are constantly improving to increase the quality of their signals so why are the data providers going the opposite direction? It's like the data providers are going from HD to analog to try to save money instead of finding ways to more efficiently transport the high quality data. To me this is typical stupid corporate short sightedness.


photo

 Borut Skok, Market Analyst

 Monday, September 28, 2015



@William: building such a base you mentioned is completely unnecessary. Much lower time frames have very similar patterns and proportions between levels.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



@Borut - I apologize but I don't understand your comment.


photo

 Borut Skok, Market Analyst

 Monday, September 28, 2015



It doesn't matter how old are data.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



@Borut - Correct, the age of the data doesn't make any difference it is the purity of the data that is of the most concern.


photo

 Jonathan Kinlay, Quantitative Research and Trading | Leading Expert in Quantitative Algorithmic Trading Strategies

 Monday, September 28, 2015



"Yes that is an option but not one I will ever choose as a primary option. " Obviously not.

Which solution providers have you used / discarded? No point in recommending a firm you have already considered.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



@Jonathan - We have tested 4 so far. We were using Spryware until September 20th when they changed their data. We are currently working with and testing Activ. Two companies previously, Bloomberg and CQG offer really poor data by our standards. In CQG's defense, they offer the best execution environments.


photo

 Alex Krishtop, trader, researcher, consultant in forex and futures

 Monday, September 28, 2015



William, seems like you miss the very point about market data and its quality. There is no such thing as "true", "clean", "right", whatever you may want to call it, data. Even if you look at time and sales you will find that not all trades are included, and not necessarily because of the laziness of the data vendor and/or exchange itself — but because of the structure of the market. Therefore you basically always have two options: either you utilise and strategy which is not so data-sensitive, or you stick to a single trading environment. A long term investment strategy is an obvious example of the former, while a high-freq arbitrage strat using a number of integrators is a good example of the latter. In other words either you use price movements for which tinier information becomes useless, or you exploit the very inefficiencies you can find in the data provided by a particular provider.

All in all what I really don't understand is if iqfeed is good for you why just not using it? The whole context of your question seems unclear in this case — especially if you say that you efficiently don't need DOM data at all to be successful. Any retail environment can't transmit T&S, if you strategy is sensitive to it then how did you manage to get good result in retail environment, if not, then what's the problem with data you're currently using?

Honestly, it sounds like you're trying to find an explanation to lack of performance in anything but the strategy logic. I understand that I sound harsh (as always, though), and of course I don't mean anything personal, it's just the general impression out of the whole discussion.


photo

 Tibor Komoroczy, CEO & Founder, Skunkworks LLC

 Monday, September 28, 2015



BLOOMBERG VPIPE


photo

 Josh Freivogel, Quantitative Researcher and Trader

 Monday, September 28, 2015



It seems you've done some great work, but may I ask one more thing? Can you give me two or three examples of "more complex direct market applications and the providers that use them?"


photo

 Scott Boulette, Algorithmic Trading

 Monday, September 28, 2015



@William, your solutions provider likely has no way of determining if you are the one guy in a thousand who says he has those stats and really does.

As to the data, I know a number of people are decently happy with DTN IQFeed. In the end, DMA is going to be your only way to know you are dealing with the best data available and that still leaves you with an issue if you are trading FX.


photo

 Borut Skok, Market Analyst

 Monday, September 28, 2015



I agree with Alex that there is no such thing as "clean" data on theirselves basis. Data are what they are. Something completely different is hidden in a question: what kind of data do I need for my algorithm (system)? Because no algorithm isn't something naked towards data, I think, that can not coexist two separated actions of this kind that only need a glue. An algorithm is always the result of using certain type of data. The point is that the glue in this case is an accommodation of the algorithm to data and all inverse actions are very risky.


photo

 Josh Freivogel, Quantitative Researcher and Trader

 Monday, September 28, 2015



Why not buy a dma platform from a shop with no good algorithm, or enter into a deal with one of the many prop shops who've developed your infrastructure needs? I know of two of the former and you only need search job postings to find examples of the latter.

In my experience, any algorithm requiring any more than a small amount of data (Def not market depth) quickly chokes a retail platform like Bloomberg or CQG, unless you're running analytics outside the platform. At my last company we used CQG'S api to execute after getting data direct from the exchange, and it compared similarly to TT and Bloomberg. It was not a clear winner for our needs. Feel free to pm me for the names of contacts to platform owners.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



@Alex - I'm not missing any point. Every trade, trade volume and every stock or commodity price in every market is trapped and archived by every exchange on the planet. If you do not think this happens you are wrong. This is a verifiable fact. Taking Dark Pools out of the equation and you can still get an accurate representation of what is happening in any general market at any given moment in time.

Did you not understand the analogy I gave Jonathan regarding the high performance Bugatti?

I will explain, again, why DTN IQfeed's data doesn't work for my algorithm running in a direct market application. DTN IQfeed's data doesn't give us quotes 10 levels deep. The specific market the algorithm runs in is a FIFO (First In First Out) market for fill. This gives us an accurate read on when the trade executes, where we will be filled based on trade order. By the way I never ever stated we "don't need DOM data at all to be successful". Let me again blueprint the process of getting me to where I am to help you understand my issue.

1. I get a good result in a retail environment using a retail trading program that calculates my results based on "Last Price Traded" and DTN IQfeed. "Last Price Traded" is not an accurate representation of how the market works because when a trade is placed you are placed in a DOM queue and must wait until all orders before you are filled. This is why having access to 10 levels deep of market data helps insure your algorithm is operating correctly.

1. I contracted a company to take my algorithm (written in Easy Language) convert it into C#, run it on co-located servers at the exchange, connected to a FIX connection and connected to my broker. The results were awesome. Awesome means good, great, outstanding, wonderful, etc.

1. The problem occurred when my data provider changed their data the evening of September 20th and started to include over 2000 extra records each day. We have verified this with them and the exchange! If the algorithm is data sensitive this erroneous data will render my algorithm ineffective.

I repeat if this algorithm was a typical one generating even 20% returns a year then it would be no big deal but the poor data caused my algorithm to go from 95% accuracy to making it useless. Garbage in . . . garbage out.

My algorithm have never "lacked performance" in an environment using clean data and solid market connections. It has only fallen apart when the data streams are compromised. And yes, we have the ability to compare the data from any provider to the exchange data for accuracy.

Every strategy is data sensitive. Ask any data scientist of that fact. Common sense is truly a flower that doesn't grow in some people's gardens. Long term investment traditionally focuses on predictive algorithms and fundamental analysis both of which is highly fallible and subject to great idiosyncrasies. Neither of which I employ in my algorithm as I previously stated. HFT looks for unique markers to capitalize on which again is something not relative to my algorithm. Accurate and precise price movement is all that I'm concerned with and no information is useless unless it doesn't reflect what actually occurred in the market.

I do not understand why every conversation we have becomes a personal attack by you on whatever point or question I bring up. Each time I post a question that you don't fully comprehend you attack it using the limited information you have on the subject. It must be amazing to know absolutely everything about every subject you've never researched or studied. Please stop interjecting unrelated and irrelevant information into my conversations. If you have a question, ask it and I will always and politely answer it.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



@Tibor - We are looking a Bloomberg B-Pipe and the jury is still out. Thanks.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



@Alex - I never said I was getting in ahead of another in the queue. Having access to quotes 10 levels deep gave me the knowledge that my algorithm would work in live markets. Testing using "Last Price Traded" data is a worthless comparison to live market trading. My algorithm doesn't need to "butt ahead in the line". It just needs to see how long the line is. The live market test was over 8 months but the results over 15 years matched within 1 percentage point and that was matching trade for trade. My server is co-located at the exchange but the difference is cost and programming between CME data and 3rd party data is vast. At some point the goal is to have a CME seat.


photo

 Scott Boulette, Algorithmic Trading

 Monday, September 28, 2015



@Alex - I cannot speak for William but I use the full depth of book to estimate my position in the order book queue. This in turn helps the algo to determine whether it will pull the order or not.

If for example, I estimate I am at position 102 on a price level with 534 contracts, I know that I have a decent chance of getting filled and back out on the other side just due to normal back and forth of trades. However, if there were 1003 on the other side, I would cancel regardless because my exit order will be at the back of that queue (position 1004).

This is only useful in relatively high frequency trading and normally only when the profit target is 1 tick.

As to your last question (again not speaking for William) - in the futures markets you can have a co-located server while still having to go through the broker's risk check. To the best of my knowledge, only those dealing directly with the clearing firm can completely bypass those risk checks.


photo

 Josh Freivogel, Quantitative Researcher and Trader

 Monday, September 28, 2015



In my opinion, it's time to partner up if you're hesitant/unable to get a seat (and pay for it with exchange fee rebates at scale) and go 100% in-house DMA. There may be IP risks, but 0% of a 95% winner is still 0%, and it sounds like consensus here is that the products and service you need do not exist off the shelf. If you arrive at a solution that works for you, I would be thrilled to hear about it. In any event, I wish you nothing but success.


photo

 Alex Krishtop, trader, researcher, consultant in forex and futures

 Monday, September 28, 2015



Scott, I understand the order book reconstruction process in general, even though I don't use it, simply based on the explanation provided by William I decided that it was critical for him to get execution ahead of others — possibly I am wrong.

What I can't still understand is the necessity of such a complex infrastructure given the model works even without DOM data at all. This puzzles me, especially along with William's remark on the strategy being not high frequency.

As to you las remark — of course you can do it, but I can't understand the reason for locating your server at the exchange and then increase latency running your orders through the broker's risk management. That was the point of my question.


photo

 Jonathan Kinlay, Quantitative Research and Trading | Leading Expert in Quantitative Algorithmic Trading Strategies

 Monday, September 28, 2015



William,

We too have used Spyware in the past and found their data to be of excellent quality. I was not aware that they had changed their methodology.

My next recommendation was going to be Activ, but you need to see if their data normalization procedures suit your model. You may need to recalibrate.

I agree with your comments about Bloomberg.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



@Scott - You are correct on both counts and these are a couple of the issues we are currently dealing with.


photo

 Scott Boulette, Algorithmic Trading

 Monday, September 28, 2015



@Alex, all but the largest operations have to go through some sort of risk check these days. I do agree with you on much of what you said, I am a bit confused myself.

@Mark, I do the exact same thing; nothing will reveal hidden flaws like live trading.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



@Alex - Faster execution means better fills. Scott's explanation is spot on with what my algo is looking at.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



@Josh - Correct there isn't an off the shelf solution. We now know what will work and just need to duplicate that process a couple times with other providers for safety. My frustration in the beginning was having representatives from retail software companies continue to tell me their software would do what I now know absolutely doesn't exist in their off the shelf packages.

It was frustrating in the beginning to give a detailed blueprint to a firm of exactly what we needed and them telling us no problem. Paying them a tidy sum only to hear they couldn't finish the project 6 months later. This happened twice. The third and current firm is doing all of the work up front because they see that immense value in it. They also made it work flawlessly . . . until the data puked. We heard back from them today an they have a solid solution for us but I'm still looking for another alternative though.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, September 28, 2015



@Jonathan - Because we are using a fixed target we can enter passively.


photo

 Andrey Gorshkov, Algorithmic Trader, C++ Developer

 Tuesday, September 29, 2015



William, are you sure you haven't run into an overfit bias? Did you try adding noise to data and parameters?


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Tuesday, September 29, 2015



@Andrey - No because we aren't optimizing the trade parameters in any way. There is no prediction in the algorithm just reaction.


photo

 Josh Freivogel, Quantitative Researcher and Trader

 Tuesday, September 29, 2015



Good to hear you're optimistic on having a solution in the near future. Are you looking for firms that will work with you on by-passing risk checks? I can suggest a few to look at based on prior relationships.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Tuesday, September 29, 2015



@Josh - That might be something useful in the future but not right now. I appreciate you offering. Send a connection and I'll add you. Looks like we are on track to have a solution by week end plus at least one alternative data source.


photo

 Jakub T., QA Specialist at CRIF

 Wednesday, September 30, 2015



* to eary to be proud...

* no technical backgound make all this discussion idle

3dly no details = no real menting = wasted of my time for reading it


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Wednesday, September 30, 2015



@Jakub - This discussion wasn't meant to mentor or be informative. I was asking questions to get possible solutions. I got no solutions but still do not consider this discussion a waste of time. Making one think that new pathways exist where you think none do is never a waste of time.


photo

 Borut Skok, Market Analyst

 Wednesday, September 30, 2015



@William:@@Jonathan - Because we are using a fixed target we can enter passively.

Why do you not enter actively if the target is known and how do you know what the target is if the speed is not on the first base of the your system?


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Thursday, October 1, 2015



@Borut - Sorry, misspoke. We enter actively. This way the target is not set until after we have achieved our order fill.


photo

 Beau Wolinsky, President and CEO of KC Capital Management and the Kansas City Stock Exchange

 Friday, October 2, 2015



Bill Elektron Thomson Reuters data is Gold Standard but very expensive. Look into it please. I've been looking forward to the finished algorithm for a long time and with TR Elektron data you'll get over this hump. I know it!


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Friday, October 2, 2015



@Beau - The MDP3 data environment is the problem. It doesn't effect the average trader or house but anyone needing precise granularity in their analysis is pretty much screwed. Unbundled data from the exchange now has 40% less ticks (transactions) than actual trading. Anyone wanting tick information is hating life. If you find a 3rd party provider that unbundles the data, good luck on finding someone inside their IT department that can tell you how the place holders or delivery fields have changed. We talked to 6 different guys (including the CIO) at one firm and each version of their "story" was different. The CME and the 3rd party providers did an A1 job of hosing this transition up but I for one expected nothing less.

I'm waiting on a call back now from a specialist at TR about Elektron but I'm not expecting anything different as a response . . . if I get a call back at all. Thanks for the heads up though.

I was contacted by someone the other day that might help us with a custom solution. I think, in the end, that might be the best solution because of our needs.


photo

 Scott Boulette, Algorithmic Trading

 Friday, October 2, 2015



@William - MDP 3.0 still has the raw passive trade sizes but you have to have access to the raw FIX message or have a provider that will split it out. The primary difference is that trades are summarized from the aggressor's perspective not from the liquidity provider's perspective.

If you need trade volume, it is exactly the same, even trade volume by price in a multilevel price sweep. What you don't know (unless you get the detail mentioned above) is the number of individual passive orders the aggressor took out. You might look at the order count field but since it isn't important in my trading I can't say definitely.

I am curious if there is a non-proprietary, short answer as to what knowing the individual passive trades gives you. With the preponderance of algos shredding their orders and the average trade size in the ES at something around 2.5, I have found little informational value in all that because it has become so obfuscated.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Friday, October 2, 2015



What we've learned so far . . .

· The CME is publishing bundled/aggregated trade data only.

· However, all the data required to unbundle that trades is included in the CME bundled/aggregated data.

· If a data source or user needs to have unbundled data, they need to unbundled it themselves.

· Each data source has their own unique algorithm developed to unbundle the data. As such, there is no standard method of unbundling, so sets of unbundled data from different sources will never be identical.

· SpryWare is delivering only the bundled CME data at this time.

· SpryWare has offered to investigate what it will take to provide users with unbundled data, directly in a SpryWare data feed.

· Our solutions provider could theoretically provide us with unbundled data directly from the bundled SpryWare data feed that SpryWare is delivering at the present time, but they would need to build their own algorithm to unbundle the SpryWare bundled data. This would be a major undertaking for them, requiring a lot of resources and time if they were to develop their own unbundling algorithm.

· DTNIQ has acknowledged that there is a slight issue with their unbundled data, which they have implemented and are presently testing it before it gets released. In DTNIQ’s opinion (and ours), after the change gets released, they will be providing the purest and best unbundled data available.

· DTNIQ is a supplier of the bundled CME data to CQG. DTNIQ does not know how CQG is unbundling their data, but in our humble opinion, it is not accurate.

· It will be impossible to compare unbundled data from different sources due to different independent algorithms employed by the different data sources. We are left to validate any unbundled data that we will receive in the future by some other means other than direct comparison to other data sources. What a mess!!!


photo

 Scott Boulette, Algorithmic Trading

 Friday, October 2, 2015



@William, yes I agree that DNTIQ has an issue; I have compared my DMA feed with DTNIQ and it isn't really that same (close though) primarily because it appears they sort of just reverted to the old way of reporting data making it difficult or even impossible to gain anything from the new protocol.

However, the data is structured differently than what providers are implying (if I understand what you wrote). I have the raw FIX so this isn't speculation on my part, not to mention the CME publishes the message structure.

You get match events that may be one trade or potentially thousands. The event is a combination of the aggressor trade plus any stops (held at the CME) it triggers. You get one message for each price level traded and within that message you get the individual trades just as before with a flag that tells you this is it for the trades portion. You then get the new book updates and at some point, you get a flag indicating this is the end of the entire process. This is atomic and deterministic; nothing comes in between or out of order.

What is very complicated is the corner case(s) of spread trades or any sort of trade with individual legs with enough sub messages that the message length is reached and therefore the event spans multiple messages. This is where the data providers are having fits. Since I have DMA and don't really care about the edge case, I can react to the end of trades flag, the end of book updates flag (attached to one of the book updates) or a book update itself to know the event is over.

The issue for the data providers is that in some cases where the totality of the trade even spans more than one message. In their defense, this is a nightmare to deal with because you never really know when the part you care about is over until everything is over and if you are getting a subset of the total message from a provider, you as an individual may never get the end of event flag.

Even major players in the FIX message handling software aren't supporting a lot of the unbundling; it is too much work to justify for the few firms that need it (those who need that performance/granularity have long since written their own FIX engines). So in the end, the data is still the data as long as you are willing to accept it from the aggressor side and as long as you are willing to sort out the end of event flag in the rare cases you don't get it from the provider.

And this concludes my novella on the subject :)


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Monday, October 5, 2015



@Scott - Just a mess that the CME thought that aggregating/bundling the data made it easier and more cost effective to send/distribute. All it did was to complicate the process and negate any usefulness that retail charting programs will ever have again at running high performance/highly granular algorithms. Ut oh . . . I hear Russian footsteps in the hallway over that comment. My point is that without, like you said, access to custom FIX engine protocols, running highly granular algorithm are worthless or at best inconsistent like the rest of the algorithms running out there. So the question now become . . . who knows where I can get my own FIX engine created at a reasonable cost?


photo

 Scott Boulette, Algorithmic Trading

 Monday, October 5, 2015



@William, lol, I can answer that one for you - you are nowhere! You are talking about a league that few can afford.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Tuesday, October 6, 2015



@Scott - No worries and thanks for your help on this. We are working on our end on a permanent solution.


photo

 Brad L., Systems Engineer; electronic systems, radio and wireless communications, financial market analysis.

 Wednesday, October 7, 2015



It has been my experience that bundling processes do change occasionally, and not always with warning. There are many unplanned events, i.e. unplanned from the perspective of the data provider, taking place which can affect data integrity.

I compare market data sources to airliner flight management. While the technology exists for airplanes to largely fly themselves, we still insist on pilots up front because events pop up for situations which the designers of the flight automation system never anticipated. It's the same with market data sources. The user must always be on guard for anomalies.


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Wednesday, October 7, 2015



@Brad - The MDP 3.0 data coming from the CME is aggregated or bundled. Data will never be the same again and everything you thought you knew about the data is now wrong . . . PERIOD! Every 3rd party provider that touches the data from the CME that tries to unbundle it will do it differently because none of them work together. The only unbundled data that will be the same will be that which is shared by 3rd party providers.

Bundling processes change yes but not at the level the CME has done. From this point on one uses the bundled data or they use nothing at all. You must find a way to make the new data work. We are. Not all data users require the granularity we do so I can only imagine others will have less than an issue than we will. On the other hand I consider the data granularity we require an asset, not a liability. I predict the next few months is going to be a fun ride in the markets.


photo

 private private,

 Wednesday, October 7, 2015



Just a couple of questions for a curious mind if I may.

Is a direct connection to the exchange an option for you or is this too cost prohibitive? (as you say you are already co-located) Would not this provide the tick level data you require unfiltered if 3rd party providers are unable/unwilling to?


photo

 William Schamp, President/Quantitative Analyst - Beacon Logic LLC

 Thursday, October 8, 2015



@Nick - I already have DMA data. The initial problem was that the granularity I utilize wasn't in a recognizable format. The CME changed the format and never told the 3rd party providers what those changes were. Typical "lack-of-communication" by corporates. The granularity I need is deeper than tick data as well. We are in the process of securing a 2nd feed as a back up and it as well is a DMA feed. By the way, filtering isn't the problem, the aggregated of the data is.

We are working through it and our confident that when this all shakes out we will have a stronger algorithm because of it. The deeper we go the more we see what everyone is missing.

Please login or register to post comments.

TRADING FUTURES AND OPTIONS INVOLVES SUBSTANTIAL RISK OF LOSS AND IS NOT SUITABLE FOR ALL INVESTORS
Terms Of UsePrivacy StatementCopyright 2018 Algorithmic Traders Association