Mysql – Storing 5TB web server access log vs OLAP DB

columnstorehadoopMySQLolappostgresql

We have over 5TB compressed web server log in a HDFS and we often analyse using Hadoop.

It is painful to run map reduce on 5TB of data and most importantly, not many developer are familiar with it.

I am thinking if we should store the data in columnar database such as Greenplum or other MySQL column store which are aimed to store the analytical data in a efficient manner but able to support raipid query, which is quite important lately.

What are base you would recommend? Anything I should consider before the move? (I will do my own test anyway)

Best Answer

I recommend Vertica.

You can get the free community edition that allows up to 1TB of data. If you normalize your web logs when you load them in, chances are they'd compress down and may fit under 1TB, as Vertica has a pretty powerful data compression engine itself.

If not, I'd still recommend trying out the platform, but the license fee isn't the cheapest thing in the world.