How to store tabular data in an RDBMS

database-agnosticdatabase-design

First: I really haven't got any idea how to google for this. If you got one, leave a comment.

If I want to store arbitrarily large tables in a database, how should I set up my database tables?

Requirements:

  • A table has an arbitrary number of columns and rows (columns about 1-10k, rows about 0-10M)
  • There are thousands of tables
  • A table has a name
  • The columns of a table have a name
  • Column-Cell values can be either numbers or strings
  • A column always has the same datatype

Data I want to store:

"Bob the Table"    

| "Name"   | "Size" | "Comment" |
---------------------------------
|  "Homer" |  170   | "A guy"   |
|  "Lisa"  |  120   | "A girl"  |
|  "Bart"  |  130   | "A boy"   |

You get the idea. Table could be much larger.

Data I want to query:

Most of the time I want to query a single table, that is, given a table's name, just display "the table" as depicted above.

How would you implement the RDBMS tables and how would you write queries to select the data for a single table?

Disclaimer: This isn't homework. This here is purely out of curiosity while working on another problem.

Best Answer

Arbitrary data can be put into a RDBMS even if, as the word might suggest, some relation should be applied.

Some points are obvious: every table must have a unique name, any column must have a unique name. For datatype you can use everywhere a VARCHAR. The best way is to analyse the table requirements and use the appropriate datatype for each column, this is useful for space management and query.

For query: in order to speed up the query on your data you have to apply a partitioning (and, if the database technology can support it, subpartitioning). The efficient use of partition and subpartitioning can speed up all DML operations if you can apply to every SELECT statement a WHERE clause based on the partition key. If you want also to use parallel access try to split different partitions across different disks accessed by different disk controllers.

Moreover many RDBMS allow a limited number of columns per table.

Finally, this is a very theoretical answer, too many details are missing. Although you can consider to leave the RDBMS techonology and adopt a NoSQL database. Probably, with the info supplied, a columnar nosql database is the best choice for you: HyperTable or HBase for example. This document is a must read for NoSQL Databases