We're building a system where the data in 10 tables is associated with Accounts. A typical table looks like this:
create table Things(
accountId varchar(64) not null,
internalId varchar(64) not null,
externalId varchar(256) as (concat(accountId, '-', internalId)) stored,
...
primary key (accountId, sourcedId),
unique (externalId),
foreign key (accountId) references Accounts (id)
);
All queries either have an accountId
in the where
clause, or they use externalId
. There are no cross-account queries.
We're expecting to have a total of 200 accounts. The sizes of the other tables (like Things
) vary from 5 rows per account for some tables (1000 rows total) to 225K rows per account for some other tables (45M rows total). (These are the numbers we use for performance testing – they're the max numbers)
The DB size is ~150 GB. 95% of the scenarios are reads.
The RDBMS is Mysql 8.0.16 (AWS RDS).
We don't have any performance issues at the moment and we're not trying to make anything work faster. But I'm wondering if NOT partitioning the tables like Things
by accountId
is a "premature pessimization"?
Best Answer
If you are using InnoDB, the clustered index will (generally) already organize the data by
AccountId
. If the number of writes is generally low/fairly balanced, there should be no need to partition by everyAccountId
.Partitioning might help in the following scenarious:
If you do end up partitioning, doing so by every
AccountId
will probably be overkill - it's better to determine some criteria for how/when you'd partition.Edit: As RickJames helpfully points out, MySQL can't/won't enforce FK constraints with partitioned tables. So you'd need to create an alternate method of enforcing those, which is extra code overhead and the risk of invalid data. So if you do end up partitioning things, might make sense to upgrade to a database without that limitation (SQL Server would be the next step up in the "Supports clustered indexes, can implement partitions without breaking" category.