PostgreSQL autoincrement up to certain number then reset for sharding

postgresqlsharding

I've was recently reading an article around how Instgram used to shard their IDs and they use a unique function to generate ids on their servers https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c

But it got me curious around whether if you could just assign a shard to a particular user rather than every item and how performant it would be. Especially in a large scale website.

So would a function similar to the following be a valid approach.

Assuming a users table on a dedicated database purely for users.

User
– UID
– name
– shardId

Where shardId is calculated via a function that starts at 1, and for every new insert it would increment – upto a max of n (say 2048) for this example. Then reset to 1. Almost acting as a round robin to distribute users data among a shards.

Would there be serious performance implications to such a function, especially during high load? As serial columns simply select a nextval from a sequence do they not? It's just an extra step of reseting a sequence once it reaches n. Or am I barking up the wrong tree?

Best Answer

As serial columns simply select a nextval from a sequence do they not? It's just an extra step of reseting a sequence once it reaches n

Not even an extra step because cycling is a built-in functionality of sequences, See CREATE SEQUENCE.

Example:

=> create sequence s maxvalue 5 cycle;

=> select nextval('s') from generate_series(1,7);
 nextval 
---------
       1
       2
       3
       4
       5
       1
       2