Binary storage in Cassandra, HBase

cassandrahadoophbase

I am looking at some implementations of Cassandra and HBase for medium-sized data sets (~1M resources) to be exposed to clients as graphs (via e.g. Tinkerpop). I would also like to store binaries in the same data stores. While it seems like both systems support storing large binaries one way o another (HBase via HDFS) I wonder what the performance implications would be for using these versus flat file storage. Are these systems designed to store binaries at scale, or are they more targeted at metadata storage? I am talking about 100s of Tb of binary data.

Thanks

.s

Best Answer

Cassandra definitely is more suitable for storage of metadata only, when you have big payloads, it's performance isn't very good. Similar stuff was for HBase when I did use it several years ago.

For storage of binary data itself, I maybe would go with something S3-compatible.