Storing ~700 Tb in SQL Db

dynamodbnosql

I am a newbie to Db's on a whole, and I came across a tutorial, wherein we needed to store ~700 Tb of data over a few years. It was mentioned that scaling SQL for such limits of data is not the best approach, and hence in the tutorial they went ahead with NoSQL. I have a few doubts regarding this:

  1. Is it possible to store such Tb's in a single machine, as per my understanding, horizontal scaling is not possible in SQL. Please correct this if I am wrong.
  2. What will be the latency comparison of SQL vs NoSQL. Because of horizontal scaling, will a noSQL db like DynamoDB give lower latency than a SQL Db.

Best Answer

Is it possible to store such Tb's in a single machine

Definitely. Though that machine becomes a large single-point-of-failure.

Building a petabyte scale storage system that is efficient and fault tolerant can itself be a challenge, doing it with cost efficiency too even more so. I assume that being a "newbie as DBs" also means you are not particularly experienced with storage solutions either, so you don't want to be trying to put together a custom build.

A not un-common arrangement is to have a couple of clustered machines running your chosen DBMS (SQL Server etc.) using an "off the shelf" SAN based storage system. I did a quick calc with the first search result for "petabyte SAN" and a ~700Tb arrangement came in at around £40,000 and that is just the storage unit and drives to populate it: add onto that racks to hold it all, decent spec machines for the clustered database servers, the enterprise licensing that is likely to be needed, ... This level of storage is never going to be cheap. You also need to factor in the ongoing costs: electricity to power it all and the required air conditioning too, replacing drives when they fail, and also the man-power to monitor and maintain it all!

You would almost certainly be better off looking at managed solutions, perhaps with one of the bigger cloud providers, and let them worry about much of the scaling, redundancy, reliability, power, etc. problems for you. For this scale you are unlikely to find an "off the shelf" price so you'll need to talk to them directly. Be warned: this amount of storage, particularly for rapid access, with good resiliency (in terms of both data security and service availability) is not going to be remotely cheap, there is no way it can be.

as per my understanding, horizontal scaling is not possible in SQL. Please correct this if I am wrong.

It is possible, though whether it is practical compared to other options depends on your data and access patterns.

What will be the latency comparison of SQL vs NoSQL.

Because of horizontal scaling, will a noSQL db like DynamoDB give lower latency than a SQL Db.

All of the above can only really be answered with "it depends" without a lot more information about the data your are needing to store, and at least a vague idea of expected retrieval patterns and other requirements.

Just knowing the size and expected growth of the data is not nearly enough for a more detailed discussion.