Cassandra Database Design – Penalties of Using Many Column Families or Keyspaces

cassandradatabase-design

I am in the process of evaluating the best design for our Cassandra installation.

There is not so much information out there on the Internet about using the first two levels of access that Cassandra provides—keyspaces and column families.

I am wondering if and what the penalties will be if you choose to create an extensive amount of keyspaces or column families (>10.000).

An old blog post somewhere suggested that Cassandra reserves memory for each column family. The article was about the 0.6 version and the current version is 1.0. Is this still the case and a real problem?

What are the penalties of using many thousands of column families or keyspaces in Cassandra?

Best Answer

Cassandra 1.0 uses a minimum of 1MB of heap per CF. So, 1000 or 2000 CFs will be okay for typical heap sizes, but 10000 will probably not be. JVM GC does poorly with very large heaps; I recommend staying under 8GB.