Efficient primary key that does not expose business internals

primary-keyrdbmsunique-constraint

I am using auto_increment primary key to store each task user performs. Users will be given the key to perform queries about their tasks.
Keys are short, efficient, but are auto incremented in a predictable way.

If a user creates 2 tasks and is given a key that auto increments for both of them, than he can calculate how many tasks happened in that time interval, giving him unwanted insight.

To combat that, I thought about using UUIDs in place of auto_incremented key. But this has it's drawbacks too. Its 32 chars long when dashes are stripped, and as far as I understand has a significant performance hit. The table in question won't have more than a million rows so I am not sure how this applies.

The last part of UUID is MAC address of a node. I will be using only 1 machine, so that part can be stripped of, making it 20 chars.

My currently favourite option is to use something like SELECT FLOOR(rand() * 10000000000); and than check if that number already exists.

Should I consider any other method of storing non predicatable primary keys? Which option do you recommend?

Best Answer

First make sure that it is important that the user doesn't know this information. Is it really a problem if they know that 12 other tasks were recorded between they last two? What could they do with this information that would cause problems?

If you are concerned that they could potentially access data they should by tweaking client-side requests (asking for getdata.php?recordid=1233, someone else's record, instead of getdata.php?recordid=1234, one of their records) then the problem doesn't lie in the knowledge that other records exist but in that the application isn't properly validating requests and/or performing checking security. If that is the case then you can solve the issue in the application layer by request forgery checking and/or privilege checks on every request (I'll not cover these in detail here as it is stepping away from the core topic of the question).

If there is a legitimate concern with using incrementing numbers, is the extra storage required for UUIDs really that expensive compared to the problem you are trying to combat?

You don't state which database engine you are using, but parts of the question strongly suggest mySQL. Other databases can handle UUIDs better by dealing with them as a native binary type, taking 16 bytes instead of 32 characters. It is a while since I've used mySQL, do modern versions support this or similar? That is still four times the size of an integer though.

You could translate the keys in the application layer by bit swapping: every time you send an ID client-side switch the bits around (swap bit 1 with bit 3, bit 4 with bit 0, ...) and translate back when receiving IDs before using them to query the database. This means you are obfuscating the IDs without altering the storage efficiency of the data but you are adding extra complication and potential bugs and inefficiency into your application(s). A seasoned hacker might see through this obfuscation, especially given a sizeable amount of data to analyse, but someone less determined will just see a random/arbitrary set of numbers. There are other such translations you could use: shift the number of 8 bits and fill in the bottom 8 with a simple hash of the original number perhaps.

Pushing the "scale and add some random bits to the bottom" idea into the database you could have a combined key that is the auto-increment value plus another integer or smallint whose value defaults to something arbitrary/random. In SQL Server you could do this in one column by using a user defined sequence to populate a BIGINT column, other DBs might offer something similar, though for many inserts there would be a performance consideration here. In fact even without the bigger type and randomness a sequence shared by all the increments in your data model may have the obfuscating effect you are looking for.

There are many imaginative ways you could make your keys more arbitrary, but they all add complexity somewhere in either the DB or the application so I'd go back to my initial response: do you really need to? If the need is real then I'd be inclined to bite the bullet and use UUIDs, type 4 UUIDs that are fully arbitrary rather the type 1s that include the MAC and so have some order, and if not I'd stick with increment integers for efficiency.

Best Answer

Related Solutions

Performance of multi field primary key or ‘contrived’ ‘semi artificial’ key

Mysql – Differences between “Unique Key” and “Primary Key”

Related Question