Mysql – Database Design Help: MySql/PostgreSQL Single Database/Multiple Databases

database-designmulti-tenantMySQLmysql-5.5postgresql

We are planning to build a web based custom business analytics tool using PHP and MySQL/PostgreSQL.
In our application, we are planning to upload customer’s data using a Excel sheet and these data will be displayed on custom build interactive dashboard.

I have few questions before I start working on this application

1) For the mass data upload from excel, which db is suitable MySQL or PostgreSQL?

2) At some point is it possible that multiple customers can start data upload at same time ?. To deal with such scenario is it advisable to create a separate db for each customer or to use a single db for application(single tenancy vs multi-tenancy). Please tell me pros and cons if any for each scenario?

3) How it would affect my dashboard performance, if I go with multiple databases design and I join tables across multiple databases?

Thanks

Best Answer

For a data warehousing/analytics type application (DW/OLAP), I would go with PostgreSQL. It has set operators, windowing functions (also known as analytic functions) and common table expressions. You will most likely have to implement some or all of these in your own code with MySQL with the attendant possibility of bugs. MySQL is more suited to read-heavy OLTP type applications, whereas PostgreSQL is better (IMHO) for OLAP work.

With respect to "multi-tenancy" - why not use the same database for all and simply identify different customers by a customer_id - or (depending on the number of customers) have separate tables for each customer and non-customer specific data in other tables? How many customers do you have and how much data do you expect to be storing/analysing in the next 10 years? Modern RDBMSs can store large amounts of data and retrieve quickly (with suitable disk configuration), but I would favour PostgreSQL for large databases.

Sharding might be interesting if you only occasionally query across customers and most of your analysis is done per customer - you could set up a table (or even schema) per customer (on different disks). Again, knowledge of data volumes would help here.

Furthermore, with respect to multi-tenancy, see my question to jynus about querying across servers - or even different schemas within the same server.

I take jynus' point about "holy wars" - this post is just my own opinion about your particular question - were I to be asked about a different scenario, I might well recommend MySQL.

[EDIT]

What, exactly, is "my dashboard performance"?