Search
× Search
Wednesday, April 30, 2025

Archived Discussions

Recent member discussions

The Algorithmic Traders' Association prides itself on providing a forum for the publication and dissemination of its members' white papers, research, reflections, works in progress, and other contributions. Please Note that archive searches and some of our members' publications are reserved for members only, so please log in or sign up to gain the most from our members' contributions.

Structured Databases vs Unstructured Databases in Trading

photo

 Muhammad A., Independent Day Trader at Equity Day-Trader

 Wednesday, May 18, 2016

For the last five or so years noSQL or as some prefer to call it SQL+ databases became very popular. They can do tasks traditional databases couldn't do efficiently. Many are using them to analyze market sentiment. What I am trying to explore is this: It seems to me these new databases are needed when we are exploring data similar to what's in emails website comments, and data that can include any type of information, images, sound bites, and maybe entire interactions history of a person or entity with the entity we are researching. That's what the "unstructured" term is attempting to describe, data that can be anything and everything. But when you look at market data it is structured data. It's structured time series of price and volume. So does the noSQL databases analysis has an advantage over traditional database based analysis? If yes how?


Print

20 comments on article "Structured Databases vs Unstructured Databases in Trading"

photo

 Jyoti Kumar, CEO & Founder at SAFE ANALYTICS PVT LTD(SafeTrade.in)

 Wednesday, May 18, 2016



Your thought is perfectly in right direction. Actually, your options are not restricted to NoSQL vs SQL (RDBMS). There is one more variation with RDBMS and that is In-Memory database (Oracle TimesTen, Microsoft SQL CE etc.). These databases have some limitations but they are very fast in comparison to regular RDBMS databases. For your scenarios, linked limitations (64GB max size, and chances of inconsistency in case of system failure ) should not matter.

If you have already existing application with regular database then before moving to something else you may try to move the database on solid state drive (or even RAM Drive). This strategy will give you significant jump in performance within no time and without code changes.


photo

 Oscar Cartaya, Private Investor

 Saturday, May 21, 2016



Interesting stuff, I will have to look into NoSQL databases and InMemory databases. I think too much structural rigidity forces architectures in the databases that may prove unwieldy and may not accommodate outlying data or unexpected data.


photo

 Graeme Smith, Investment Manager at Uncorrelated Alpha Management

 Sunday, May 22, 2016



is already available, but AWS with a terabyte of memory should be around within a year. In-memory databases from Oracle or Microsoft offer all the advantages of SQL over NoSQL (flexibility) and should be lightning fast (I haven't used the in-memory database yet).


photo

 Graeme Smith, Investment Manager at Uncorrelated Alpha Management

 Sunday, May 22, 2016



For financial/text/sentiment applications, I think an SQL database on FusionIO or In-memory is easily fast enough to store the real-time data you describe. And the structured/unstructured thing is a bit of a detour, most databases will store unstructured data. The main issue will be how to turn that into useful information. The bottleneck will be the processing of "I had a really bad experience at the xx store today" or "xx to beat estimates today" into a sentiment indicator and a tradable signal.


photo

 Jyoti Kumar, CEO & Founder at SAFE ANALYTICS PVT LTD(SafeTrade.in)

 Sunday, May 22, 2016



It is really not the case to prefer one over other. RDBMS & NoSQL has very clear and separate use cases. When you need consistency and transaction nature then you have to go for RDBMS. Like most of the banking operations (except offline reporting service etc.). Most of the e-commerce websites use NoSQL for catalog and even for cart system. Because this make their system fast (may not be accurate). However, once final order started then all information related to payment and orders are handled by RDBMS. This is the reason you may face the situation that catalog is showing availability of some item but final ordering system rejects it.


photo

 Muhammad A., Independent Day Trader at Equity Day-Trader

 Sunday, May 22, 2016



With noSql you search a persons name for example and you can get their entire history purchases , likes, comments, interactions, pictures,...etc. This would require lots of table joins in SQL and wouldn't be fast.


photo

 Muhammad A., Independent Day Trader at Equity Day-Trader

 Sunday, May 22, 2016



You guys have no idea how much time I spent trying to answer this question. I had lots of discussions with so many so called experts, some discussions gotten heated and very confused.


photo

 Andrew Kovalev, Principal Quant Researcher at Motif Capital Management

 Sunday, May 22, 2016



Neither of them will be very useful until it is very clear what analysis you actually want to perform. Two occasionally competing of data storage model are 1/ ability to store everything and quickly and 2/ ability to retrieve anything and in reasonable time. You rarely can have is both ways. You probably can have nosql (or flat files) to capture incoming data-stream, which would be un-indexed and barely usable for any analysis, or you can index all your data aggressively for any possible analysis performed, but you won't be able to build those indices on the fly. Lets take your analogy of customer orders further: good luck running analysis that needs to know all purchases of <5lbs hammers on a week preceding father's day, or all purchases of power tools made between 4 and 5pm by any guy named "John" anywhere in western hemi in past 2 decades. You need to anticipate data access needs of your analysis.


photo

 Muhammad A., Independent Day Trader at Equity Day-Trader

 Sunday, May 22, 2016



Considering traditional market data, sql type analysis is more appropriate than noSQL analysis. However, noSql analysis is being used in untraditional ways such as, analyzing pictures to assess crowd volume in shopping centers and parking lots or twitter sentiment analysis.


photo

 Yaakov Borstein, Chief Data Scientist at Mediashakers

 Monday, May 23, 2016



both have in memory offerings, and sql has very light versions, like the one used in many phone apps.


photo

 Yaakov Borstein, Chief Data Scientist at Mediashakers

 Monday, May 23, 2016



my experience is that you keep the transactions records in sql, but all the rest leave in nosql. there is absolutely no loss of information or inaccuracy when using nosql vs sql. and there are transaction pipelines that can be used in nosql.


photo

 Yaakov Borstein, Chief Data Scientist at Mediashakers

 Monday, May 23, 2016



I have built an ETF moni


photo

 Yaakov Borstein, Chief Data Scientist at Mediashakers

 Monday, May 23, 2016



monitoring system in redis that holds all major etfs, and all their components, plus tons of other info. building a graph like structure to pull out associations and find relationships is immensely easy once u get it right. And finally, if you build your hashes and lists correctly, and play around with the setup, you can crush huge volumes of data into a very small space with no loss in performance.


photo

 Yaakov Borstein, Chief Data Scientist at Mediashakers

 Monday, May 23, 2016



(sorry one last point) ... hi frequency trading may very well require specifically tailored db's that are sql based. but my guess is that u could still implement the same if not better in nosql if you structure and deploy it correctly.


photo

 Yaakov Borstein, Chief Data Scientist at Mediashakers

 Monday, May 23, 2016



The current system I'm writing for a startup fund has multiple queues so that multiple processes can access without hindrance. It's all in-memory. So one process (in R for now) just reads the stream of quotes and depth and pushes to Redis ordered sets with a unix timestamp (also good for playback). I do a small amount of O(1) "running / windowed" stats in the data collection process and push the stats to another Q/list. Another process looks at that Q, and decides whether an order is needed...it does a lot of number crunching so is independent, anf if a trigger is hit, it pushes an order to a 3rd Q which is wired up to the execution API, also independent. You can use R-Shiny to tap into all the Queues and process without any performance hit. It's completely scalable. The algo part (that looks at second Q) is completely black box with a simple interface so the client doesn't need to share proprietary info if they choose not to (would be nice though!)


photo

 Yaakov Borstein, Chief Data Scientist at Mediashakers

 Monday, May 23, 2016



for a reference on O(1) running and windowed stats, see the blogs by John D Cook, and in general the literature on online streaming stats. It's a domain that is exploding with interest today. There is a also really nice library in Julia called OnlineStats.jl for this kind of work.


photo

 Yaakov Borstein, Chief Data Scientist at Mediashakers

 Monday, May 23, 2016



also, look at https://www.youtube.com/watch?v=Cjr__VcmzJY , where the writer (of R's rredis and doRedis packages) demos spiffy almost 0 cost parallel computing using R. Very nice example.


photo

 Muhammad A., Independent Day Trader at Equity Day-Trader

 Monday, May 23, 2016



very informative, thanks!


photo

 Ram Shankar, Financial Technologist / Architect

 Monday, May 23, 2016



In trading systems, frequency, latency and performance are three major important factors. So noSql database would complement and well suit for developing trading systems compare to structured / relational database.


photo

 Yaakov Borstein, Chief Data Scientist at Mediashakers

 Monday, May 23, 2016



@ram... yes, frequency, latency and performance are critical factors, but don't forget cost, scalability, maintenance and ease of deployment. NoSql solutions have become popular in recent years, in many domains, and are very much alive finance/trading because they address all of those factors and often stand out as winners for many use cases. Muhammad's use case as described in this post can be easily included within that set. In fact, it's probably the straightest, simplest and low-cost path he can take in his effort to generate value for himself, which in the end is what all this is about.

Please login or register to post comments.

TRADING FUTURES AND OPTIONS INVOLVES SUBSTANTIAL RISK OF LOSS AND IS NOT SUITABLE FOR ALL INVESTORS
Terms Of UsePrivacy StatementCopyright 2018 Algorithmic Traders Association