I have a client who wants to analyse calling records of customers to make a decision as to whether they qualify for loans or not.
They are in partnership with a mobile company that supplies the call data daily. The customer base of the mobile company is over 13million. The call data, voice and data credit is used to determine eligibility in conjunction with set rules.
Daily call records is about 70 million records and retention period of 90 days.
So am looking for a tool and database to help design the analytics engine to determine eligibility of potential loan requestors.
Mariadb – Big Data Storage and Analysis
mariadb
Related Question
- Mysql – MariaDB CONNECT engine BIN table – file record layout, storage of a date data type, padding
- MariaDB CONNECT Engine – Overview of MariaDB CONNECT Storage Engine
- MariaDB Storage Engine – Choosing the Best Option
- Mariadb – how to insert big data in mariadb with execution time less than max_statement_time value
- Mysql – in Mariadb – Can you cache the indexes and data for Aria Storage engine tables
Best Answer
A couple of points.
When looking at benchmarks, I would double-check to see that it is running the type of load that you are focused on. You mention supplying "the call data daily". Is this via a large batch? How much data per call? I imagine it can't be very much - with say 500 bytes of data per call, that makes 50GB per day = 18TB per year (~5 TB/90 days) - but with aggregation, you should be able to reduce this storage requirement.
Will you need old(er) data for this analysis? How far back do you want to go?
You should also factor in a reasonable path for
and
My guess is that if a telco is looking for an upgrade, money will be no object (:-)).
If I were you, I would be looking at a system which can handle aggregate queries well - I would be looking at performing these aggregates on a daily, weekly, monthly and accounting-period basis.
Since you are (apparently) adding data on a daily batched basis, you should be able to perform these aggregations on the fly with your data load process.
You appear to wish to use Open Source databases, and therefore I would recommend PostgreSQL - it is (IMHO) a much more capable database technically than MySQL.
It has
CTE
s, Window (aka Analytic) functions,CHECK
constraints (astoundingly, MySQL doesn't have these) and Materialized Views (all natively). MySQL is working on some of these, but they are in beta - check out dev.mysql.com.I would be looking at storing the raw data in as compressed a format as possible and working day-to-day on aggregates at least on a daily basis per phone number. This will allow you to respond quickly to most queries while still keeping all your data accessible if required.
Hardware, I'd be using
RAID 1+0
- I'm guessing that like most database apps, your system will be disk bound - make sure you get the best that you can afford on that score! I believe that a fairly average (your definition may vary!) server in terms ofCPU
andRAM
should be able for this, but again, YMMV! I'd be looking to start with at least 16GB ofRAM
(upgradable to at least 64) with reasonably modernCPU
s.Finally, remember (and this is critical) try to benchmark on your load with your hardware and your software. Benchmarks are tricky - basically there are lies, damned lies and statistics. After that, there are (other people's) database benchmarks! :-)
Best of luck with your project and welcome to the forum. Your question is a (bit) over-general here. You might like to take a look at the help centre - take the the tour.