Mysql – how to deal with thesql 500 million data per month

MySQLmysql-5.6

this is a interview question

a telecom billing system, there is a internet flow table (phone_number,start_time,stop_time,time_lasting,fare),500 million records per month(2G),every record size is 300K, insert and update when people use 4G to surf the internet,and update their account balance every time(166 write/s),and people will query their internet details from 5 * 6 = 30 hundred million (save half year data)

question1:how to use mysql to handle these data?
quesiton2:how to optimize insert and update?
question3:how to optimize query ?

my opion (mybe not correct)
1.insert and update is a transaction so use InnoDB engine
2.partition table every day
3.do I need redis or anything to help speed up insert and update ?
4.I don't know how to deal with this query optimize

thank you very much

Best Answer

Plan A: ENGINE=InnoDB, with these specific suggestions:

Since you are purging after 6 months, I recommend PARTITION BY RANGE (TO_DAYS(...)) with weekly partitions. Every week, drop the oldest and REORGANIZE "future" into "next week" and "future". See Details.

Partitioning by day leads to so many partitions that it is inefficient. And, dropping a week's worth of data is just as easy as a day's worth.

a_vlad is right that partitioning does not necessarily help with queries. But it is excellent for that time-based purge.

200M * 300K is more like 2T. Where is the math error? Perhaps 300B? Are you implying a fixed-length row? Don't.

300KB for one record seems terribly big -- what are you keeping in it? Seriously consider normalizing out any repetitive columns.

Provide a tentative SHOW CREATE TABLE and the main queries (insert/update/delete). It is unclear that there need to be any updates or deletes for your application.

166 writes/sec is pushing the limits of a spinning drive. Either use hardware RAID-5 with a Battery Backed Write Cache or use SSDs.

I suggest you adapt the suggestions in my High speed ingestion blog. That discusses a technique for getting well past 166 writes/sec. (Keep in mind that you will have peak loads that are much higher than 166.)

Plan B: NDB Cluster. (Sorry, I don't have specific suggestions. But NDB was designed for Telecom.)

Redis? That adds another moving part -- more things to go wrong. And, assuming that your traffic is reasonably consistent, there is little to be gained by any buffering scheme.

The MySQL box should be separate from the Client box(es). You should design the client boxes (web servers?) to be scalable -- any number of them, add more as needed.