Mariadb – Big Data Storage and Analysis

mariadb

I have a client who wants to analyse calling records of customers to make a decision as to whether they qualify for loans or not. 
They are in partnership with a mobile company that supplies the call data daily. The customer base of the mobile company is over 13million. The call data, voice and data credit is used to determine eligibility in conjunction with set rules.
Daily call records is about 70 million records and retention period of 90 days.
So am looking for a tool and database to help design the analytics engine to determine eligibility of potential loan requestors.

Best Answer

A couple of points.

When looking at benchmarks, I would double-check to see that it is running the type of load that you are focused on. You mention supplying "the call data daily". Is this via a large batch? How much data per call? I imagine it can't be very much - with say 500 bytes of data per call, that makes 50GB per day = 18TB per year (~5 TB/90 days) - but with aggregation, you should be able to reduce this storage requirement.

Will you need old(er) data for this analysis? How far back do you want to go?

You should also factor in a reasonable path for

  • a) increasing load,

and

  • b) upgrading your h/ware

My guess is that if a telco is looking for an upgrade, money will be no object (:-)).

If I were you, I would be looking at a system which can handle aggregate queries well - I would be looking at performing these aggregates on a daily, weekly, monthly and accounting-period basis.

Since you are (apparently) adding data on a daily batched basis, you should be able to perform these aggregations on the fly with your data load process.

You appear to wish to use Open Source databases, and therefore I would recommend PostgreSQL - it is (IMHO) a much more capable database technically than MySQL.

It has CTEs, Window (aka Analytic) functions, CHECK constraints (astoundingly, MySQL doesn't have these) and Materialized Views (all natively). MySQL is working on some of these, but they are in beta - check out dev.mysql.com.

I would be looking at storing the raw data in as compressed a format as possible and working day-to-day on aggregates at least on a daily basis per phone number. This will allow you to respond quickly to most queries while still keeping all your data accessible if required.

Hardware, I'd be using RAID 1+0 - I'm guessing that like most database apps, your system will be disk bound - make sure you get the best that you can afford on that score! I believe that a fairly average (your definition may vary!) server in terms of CPU and RAM should be able for this, but again, YMMV! I'd be looking to start with at least 16GB of RAM (upgradable to at least 64) with reasonably modern CPUs.

Finally, remember (and this is critical) try to benchmark on your load with your hardware and your software. Benchmarks are tricky - basically there are lies, damned lies and statistics. After that, there are (other people's) database benchmarks! :-)

Best of luck with your project and welcome to the forum. Your question is a (bit) over-general here. You might like to take a look at the help centre - take the the tour.