The DyanmoDB best practices make it clear that:
You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table.
I find it amusing then that just about every single tutorial I've seen dealing with DyanmoDB has a multi-table design.
But what does this mean in practice?
Let's consider a simple application with three main entities: Users, Projects, and Documents. A User owns multiple projects, and a Project can have multiple Documents. We typically have to query on the Projects for a User, and on the Documents for a Project. Reads outnumber writes by a significant margin.
A naive tutorial's table design would use three tables:
Users
Hash key
user-id
Projects
Hash key Global Index
project-id user-id
Documents
Hash key Global Index
document-id project-id
We could pretty easily collapse Project
and Document
into one Documents
table:
Documents
Hash key Sort key Global Index
project-id document-id user-id
But why stop there? Why not one table to rule them all? Since the User
is the root of everything…
Users
Hash key Sort key
user-id aspect
--------- ---------
foo user email: foo@bar.com ...
foo project:1 title: "The Foo Project"
foo project:1:document:2 document-id: 2 ...
Then we would have a Global Index on, say, the email
field for user record lookups, and another on the document-id
field for direct document lookups.
Is that how it's supposed to work? Is it legit to throw such wildly-divergent kinds of data into the same table? Or is the second, two-table design a better approach?
At what point would it be correct to add a second table?
Best Answer
Yes, it is legit to do what you are saying. Both are actually. There are some variables that you do not have in here and can help guide how the data model should be done.
For example, if 80% of all reads are to find the users on a project and that needs to happen 30,000/sec, but in your application not as many people will go that step further and find out the documents for the projects, then it is 20% of the overall reads and may only be 2000 reads/sec. That first one is the "hot path" of your application and should be optimized for.
Also think of it this way, with a non-relational database like DynamoDB, you can optimize for how your application uses and accesses the data and not like relational database where you have to worry a lot about how it is stored in the database.