NoSQL vs SQL – Is SQL a Subset of NoSQL?

nosqlrdbms

The definition is a bit confusing – basically I'm asking if SQL is a subset of the NoSQl family:

I'm asking this because "Not-only" means NoSQL is much larger, but still includes SQL as a part of it.

On the other hand, since we can't do typical sql operations like joins in a NoSQL database, then SQL is not part of nosql!

I'm wondering which holds true?

Best Answer

TL;DR

The main distinction between database systems is not the language used to query the database, but rather the consistency model of the system. The Venn diagram should be two intersecting sets - SQL is not a proper subset of NoSQL, but rather its own data access language which may or may not be complemented by other techniques. SQL would be a proper subset of database query languages.

There is NoSQL, OldSQL and NewSQL.

NoSQL used to mean "Not SQL" but now is (more) likely to mean "Not only SQL" - as many of the providers of NoSQL systems are (trying to) bolt-on/graft SQL interfaces over their Key-Value (KV), Document or Graph stores.

OldSQL systems are essentially the main RDBMS providers' offerings (Oracle, SQL Server, Sybase, PostgreSQL, MySQL...). On page 13 of this presentation (.pdf), Michael Stonebraker purports to show that OldSQL only spends 4% of the time doing "useful" work - also to be found here:

His contention is that one should split OLTP work and OLAP work between different systems - OLTP should be done by shared nothing sharded architectures such as, suprise, surprise, his own system, VoltDB and that OLAP should be done by columnar store (also, see here) type architectures (such as Vertica, in which he also had a role). It is worth noting that Stonebraker, as well as being very successful in the commercial arena is also huge in academia and has won a Turing award - Computer Science's "Nobel prize"!

NewSQL is (IMHO) the most interesting of the systems to spring (figuratively speaking) like the Phoenix reborn from the ashes of OldSQL. Their USP is that they are HTAP systems (Hybid Transactional and Analytical Processing).

These are distributed systems which can support both OLAP and OLTP queries simultaneously due to the data being spread over several nodes - which can be on or in the same rack, data centre or continent and/or be globally distributed between and amongst cloud providers to increase resilience and redundancy - expect to add ~ a '0' to the cost for every fractional '9' you add to your uptime provision.

They use a consensus algorithm (usually Raft or Paxos) to coordinate nodes' data and sharding is transparent - even to the systems' DBA's. The three exemplars of such systems would be CockroachDB, TiDB and YugaByte.

It is interesting to note that, while having a certain level of success, these systems are not (yet?) household names. The "big boys" are fighting back with columnar store and KV offerings bolted/grafted on to their own systems. Of particular interest in this debate is that these systems themselves sit "on top" of a KV store (usually RocksDB - although CockroachDB is developing their own one in Go called Pebble). PostgreSQL also has documents (JSONB) and a KV store.

To answer the question, the real distinguishing feature between systems is not the interface one uses to query the data (which can and does range from direct C language programming (imperative) to SQL (declarative) - and flavours/adaptations thereof), but rather the transaction consistency model.

These models are either ACID (consistent) vs. BASE (available) and the key is where these systems fit with respect to the CAP theorem. Also, KV stores can support some or all ACID transaction characteristics.

OldSQL and NewSQL value consistency above all else ('CP' under the CAP system) - their argument is that, for example, a banking system with inconsistent results is a recipe for disaster (true!).

However the NoSQL aficionados would suggest that not all systems require cast-iron consistency. For example, ordering a book from Amazon (say) using a BASE database ('AP' in CAP terms) - might (incorrect stock level shown) be a few days late, or even cancelled - but the upside is faster queries and easier operation (no consensus to maintain).

This is the crux of the distinction between database systems - 'CP' or 'AP'! CP will always either give you the correct answer (majority of nodes still available) or no answer, whereas AP systems will normally respond (even if only one node is up) but your answer may not have taken other nodes' data updates into account when responding (i.e. network links between them down...).

I hope this is a satisfactory "first-pass" at an answer - it's a large topic and, for db nerds, absolutely fascinating. I would urge you to read around it (Wiki is a good start, but no substitute for primary sources). In particular, I would advise you to look in detail at the underlying architectures of CockroachDB and TiDB and see how they shard and move data from node to node while maintaining consistency.

There are comprehensive lists of various NoSQL systems here and here (best IMHO). There's also a popularity ratings site here (but avoid the comparisons - they just repeat the blurb from the systems' websites - often out-of-date).

Final word, there are "mixed-model" (or multi-model) databases (ArangoDB and OrientDB - both have F/LOSS and Enterprise versions - OrientDB now part of SAP), but there's a reason why they're not household names. ArangoDB doesn't support ANSI SQL, but rather its own flavour - AQL and neither does OrientDB support ISO SQL.

The best multi-model db that I've worked with is PostgreSQL which has KV and document stores as mentioned above. You can use SQL to query these and join with "ordinary" tables. Work is ongoing to add OpenCypher - an open graphical query language standard - to it (see here and here).

Final, final word: PostgreSQL also has a Columnar Store extension and a Time Series extension - both of which are very important (sub)-classes of database system. Both extensions are just that, extensions and not forks - you can use "normal" SQL (i.e. the full range of PostgreSQL's very standards compatible SQL) with these extensions.

So, we can see that while NoSQL is, in some cases, but not all (many systems designers are happy to remain as simple KV stores for example), adding SQL capabilities, SQL is fighting back, adopting and adapting to new and/or more sophisticated data storage and retrieval requirements - some of which I haven't even touched on (GIS systems...). So to say that NoSQL fully encompasses SQL would be an incomplete picture...

As pointed out in comments to the question itself, the SQL standard now tackles non-relational paradigms and this will only increase into the future! Also addressed by other answers - worth reading (1 & 2)!

Best Answer

TL;DR

Related Solutions

Timeseries Database – SQL or NoSQL?

Proposed schema

Variations:

NoSQL – What is Unstructured Data?

Related Question