When to Use Multiple Tables in DynamoDB

The DyanmoDB best practices make it clear that:

You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table.

I find it amusing then that just about every single tutorial I've seen dealing with DyanmoDB has a multi-table design.

But what does this mean in practice?

Let's consider a simple application with three main entities: Users, Projects, and Documents. A User owns multiple projects, and a Project can have multiple Documents. We typically have to query on the Projects for a User, and on the Documents for a Project. Reads outnumber writes by a significant margin.

A naive tutorial's table design would use three tables:

Users
Hash key
user-id

Projects
Hash key       Global Index
project-id     user-id

Documents
Hash key       Global Index
document-id    project-id

We could pretty easily collapse Project and Document into one Documents table:

Documents
Hash key    Sort key        Global Index
project-id  document-id     user-id

But why stop there? Why not one table to rule them all? Since the User is the root of everything…

Users
Hash key    Sort key
user-id     aspect
---------   ---------
foo         user                   email: foo@bar.com ...
foo         project:1              title: "The Foo Project"
foo         project:1:document:2   document-id: 2     ...

Then we would have a Global Index on, say, the email field for user record lookups, and another on the document-id field for direct document lookups.

Is that how it's supposed to work? Is it legit to throw such wildly-divergent kinds of data into the same table? Or is the second, two-table design a better approach?

At what point would it be correct to add a second table?

Best Answer

Yes, it is legit to do what you are saying. Both are actually. There are some variables that you do not have in here and can help guide how the data model should be done.

What kind of scale are you looking to get to with this application and data model?
Of the application's access patterns, what is the ratio of reads between those patterns. Meaning which one is hit most over the others.
Of the access patterns you list, how many times per second are they performed?

For example, if 80% of all reads are to find the users on a project and that needs to happen 30,000/sec, but in your application not as many people will go that step further and find out the documents for the projects, then it is 20% of the overall reads and may only be 2000 reads/sec. That first one is the "hot path" of your application and should be optimized for.

Also think of it this way, with a non-relational database like DynamoDB, you can optimize for how your application uses and accesses the data and not like relational database where you have to worry a lot about how it is stored in the database.

Best Answer

Related Solutions

Database modelling best practice question

MySQL – Database Design for Changing Number of Fields

Related Question