Sql-server – Database/Storage engine suggestion for a project involving heavy inserts

database-designdatabase-recommendationMySQLsql server

I am looking into a project which involves large number of inserts daily.I will have a list of user's (for example a user set of 500k ) , for which I need to monitor daily certain activities associated with them.

For example, let there be a set of 100 users say U1,U2,…,U100

I need to insert their daily scores into my database.

Consider the total score obtained for a user U1 for period June 30 – July 6, is as follows

June 30 - 99
July 1 - 100
July 2 - 102
July 3 - 102
July 4 - 105
July 5 - 105
July 6 - 107

The database should keep daily scores of each users ,like

For user U1,

July 1- 1pt (100-99)
July 2- 2pt (102-100) 
July 3- 0pt (102-102) 
July 4- 3pt (105-102) 
July 5- 0pt (105-105) 
July 6- 2pt (107-105) 

Similarly the database should hold daily details of the full set of user's.

And on a later phase ,
I envision to take aggregate reports out of these data like total points scored on each day,week,month,etc; and to compare it with older data.

I need to start things from the scratch.I am experienced with PHP as a server side script and MYSQL. I am confused on the database side ?
Since I need to process about a million insertion daily,what all things should be taken care of ?

Does MySQL fits my requirement,If so what storage engine should be used ? Initially I envision to create a user table with a foreign key user id and monthly score tables with dates as fields.And later on I got suggestion to write things into a csv/excel first and then to load them to table after a particular period.

Does file insertion makes things more favorable in this regard.

Or should I try for some other databases,NoSQL methods ?

EDIT

I am summarizing my requirements,I need to have a database of a million user's whose points are to be updated every day as separate entries.
This will be done regularly such that there should a field per day for each using showing daily points,which could be aggregated in weekly/monthly/yearly basis.
I am confused on the database design as well issue which could happen ever after deployment.
A million or more DB operations are done every day.How the server and other things are to be considered in this case.

Any help will be highly appreciated.Thanks in advance.

Best Answer

Let's break this question up into a few parts.

Q: I need to insert 1mm rows a day. Is that a lot?

Not really. 1mm divided by 24 hours divided by 60 minutes divided by 60 seconds gives you about 12 inserts per second. For a rough frame of perspective, it's not unusual to see 1,000 inserts per second in typical commodity servers with no tuning.

Granted, your load won't be perfectly averaged out like that - you'll have bursts of load - but I wouldn't make database platform decisions based on less than 10k-20k inserts per second. Any platform out there will work fairly well.

Q: How should I structure the data?

Zoom out - don't think table, think databases. If you're going to be keeping this data permanently, and it's truly insert-only with no updates, then you probably want to start a new database for time lengths. Your inserts may only go into one table in one database, but every year, create a new database (MyApp_2015) and seal the old 2014 data as read-only. You can stop backing it up (as long as you've still got a good backup once), stop doing index maintenance, statistics updates, etc.

The PHP will only ever have to know about the current database for inserts, making your design a lot easier. The archival process becomes a DBA task much later down the road as long as you go in knowing that there will be more than one database involved.

If you were doing more than 1,000 inserts per second sustained, and you wanted easier performance management, then I'd also suggest building sharding into the initial design regardless of the database platform. Don't get me wrong, any modern database can handle over 1,000 inserts per second, but designing sharding in now just gives you more flexibility later on. At 12 inserts per second, it's just not worth the design/testing hassle.

Q: How should I do reporting?

In an ideal world, reports would not be done against the live server. Run the reports against a restored or replicated copy of the database. This does two things: it reduces load on the live server, and it validates your backups, guaranteeing that you've got your valuable data elsewhere.