My broader question here has to do with when it is appropriate to use HStore vs. multiple tables vs. one table for storing document-like objects.
Now, for my localized example: I’m designing a database structure for holding experimental results. My first idea was to have separate tables for each experiment. This would look like:
However, this would require a lot of JOIN
s across tables since it would be fairly common for end-users to want to look for data across multiple experiments. (Side note: maybe a good use case for inheritance?). I thought I could avoid this by having a monolithic experiments table like this:
That would certainly eliminate the need to have JOIN
s but I feel like it would be logical to have individual experiments be separate in some way. Also, it will be very common that one experiment is requested in its entirety. This is why I’m considering using HStore
kind of like this:
This appeals to me because each individual result set is very document-like, but I’m afraid that I’ll be running into the same issues that the multiple tables approach would bring.
Other considerations and thoughts that may be relevant:
- I’ve thought about having additional tables for storing meta-data/annotations relating to individual data points in individual experiments. Some of this would fit very nicely into the HStore experiments table.
- Different users will be inputting experimental data, I would use schema for this
- There will be different experiment types – I think the best way to handle this would be storing these in entirely different databases
I should also mention that I have very limited experience with PostgreSQL, but I’m already delighted at what it has to offer over MySQL!
Best Answer
Both of those are good arguments for using table partitioning. PostgreSQL implements this with inheritance and constraint exclusion.
Nothing stops you having an
hstore
field for annotations, or a separate join table ofhstore
fields for them.Why? Surely the same argument applies, that you'll want to do cross-user aggregation too?
I'd just partition by experiment and have the user ID as part of the table's key.
Why?
Do you ever think you might need to query across them or aggregate them? If so, do not store them in different DBs.
If they store similar kinds of data, just have them in the main table, and have some optional columns.
If they store mostly different kinds of data (mostly or entirely different columns) then use different tables in the same DB.