Mnesia: advantages and differences

feature-comparison

What are the advantages of Mnesia over major SQL database implementations and how it differs to them?

Can I use the database to hold really huge amounts of data without noticeable performance degradation ?

Best Answer

Sorry for being late to the party. :) Here's my answer, based on having used Mnesia since 1996 and various other database technologies since 1988.

Mnesia and MySQL are indeed different beasts, and which one is the best depends very much on how you intend to use it.

If your application is written in Erlang, Mnesia allows you to store the data in the same memory space as your application, which means you can fetch a single data object as quickly as a few microseconds. This is not possible in MySQL, since your application and the database will be separated in memory. The reason why Mnesia can do this and still be robust, is that Erlang implements memory 'protection' at the language level.

Overall, SQL databases tend to favor throughput over latency, and when it comes to latency, Mnesia+Erlang are generally outstanding. You need to decide which one is most important to you. As it says in the docs (above), Mnesia's target applications were telecom switching applications, where response time requirements for e.g. a call setup were around 20 ms. This essentially meant that you could read from the database only if the data was in shared memory, but would avoid writing to persistent storage on a per-call-setup basis. OTOH, these applications have practically no need for ad-hoc query support, and do not use very large data sets. Some work has been done to extend the suitability of Mnesia for other domains, but it is not a priority for the Erlang/OTP development team. Mnesia is what it is, and is likely to stay that way.

In the link above where Mnesia and MySQL are compared for speed, one needs to remember that it's in eJabberd, which runs against a single server if it's MySQL and runs a fully replicated database if it's Mnesia - and large eJabberd clusters can have as much as 10 or more erlang nodes (and thus, 10 or more Mnesia replicas). From a redundancy standpoint, this is fairly ridiculous and costly, and Mnesia by no means forces you to do so. It obviously gives wicked-fast reads on each node, but writes will be very expensive. Several comparisons I've read have ended up comparing distributed Mnesia with a single-node MySQL; if redundancy is not needed for MySQL, it shouldn't be required for Mnesia either. Mnesia is quite flexible in letting you choose replication patterns, and data location is transparent to the application.

Mnesia is also not limited to 2 GB per table (although a particular storage option is). The largest Mnesia database I know of has ca 600 GB of data in (64-bit) RAM+disk - although I do not recommend this. Anything up to 10-20 GB should be perfectly fine with modern hardware though, but skip disc_only_copies entirely and use disc_copies - buy more RAM if you have to. I'd think twice before using the sharding support (mnesia_frag) - it works, but is rarely worth the trouble.

Perhaps the biggest difference between Mnesia and MySQL is SQL itself: Mnesia doesn't really have comparable functionality; QLC offers some support for ad-hoc queries, but it is not in the same league as SQL, and neither is the level of query optimization. In tooling and provisioning, MySQL is also superior, and if you need analytics, there's no question which one you should choose (i.e. NOT Mnesia).

The best way to view Mnesia is as an extension to the Erlang language. It puts data right at your fingertips, and is excellent for small data sets where the data structure and access patterns are well known. For this purpose, using MySQL is about as uncomfortable as is using Mnesia for the things where MySQL works best.

Most applications fall somewhere in between, and this is where it becomes a judgement call. You may well end up using both...