I follow the following rules for primary keys:
a) Should not have any business meaning - they should be totally independent of the application you are developing, therefore I go for numeric auto generated integers. However if you need additional columns to be unique then create unique indexes to support that
b) Should perform in joins - joining to varchars vs integers is about 2x to 3x slower as the length of the primary key grows, so you want to have your keys as integers. Since all computer systems are binary, I suspect its coz the string is changed to binary then compared with the others which is very slow
c) Use the smallest data type possible - if you expect your table to have very few columns say 52 US states, then use the smallest type possible maybe a CHAR(2) for the 2 digit code, but I would still go for a tinyint (128) for the column vs a big int which can go up to 2billion
Also you will have a challenge with cascading your changes from the primary keys to the other tables if for example the project name changes (which is not uncommon)
Go for sequential auto incrementing integers for your primary keys and gain the inbuilt efficiencies that database systems provide with support for changes in the future
You should have an autoincrement PRIMARY KEY. Since I do not know the storage engine of the table, I will try answer based on both.
InnoDB
In general, a smaller primary key is always better than a bigger one. The PRIMARY KEY for an InnoDB table is stored in the Clustered Index (known within InnoDB as the gen_clust_index
). Since an InnoDB Page is 16K, smaller keys will make more keys fit inside an index page.
What should be noted is the fact that for each entry in a Secondary Index, there is Primary Key. Thus, not only will a smaller PRIMARY KEY benefit the table, but all non-unique Indexes will corresponding shrink as well.
MyISAM
Similar principles apply to MyISAM in terms of key sizes and indexes. Additionally, there is an added bonus in your particular case that is not often discussed when it comes to MyISAM.
MyISAM allows you to have an auto_increment key per column value. What do I mean?
Look at the table in your question with additional rows:
Id | Prod | Acc | Val
---------------------------------------------
ABC-AB12_1 | ABC-AB12 | ABC1 | 1
ABC-AB12_2 | ABC-AB12 | DEF1 | 2
ABC-AB12_3 | ABC-AB12 | GHI1 | A
DEF-AB12_1 | DEF-AB12 | ABC1 | 1
DEF-AB12_2 | DEF-AB12 | DEF1 | 2
DEF-AB12_3 | DEF-AB12 | GHI1 | A
GHI-AB12_1 | GHI-AB12 | ABC1 | 1
GHI-AB12_2 | GHI-AB12 | DEF1 | 2
GHI-AB12_3 | GHI-AB12 | GHI1 | A
You could replace the Id with an auoincrement value and end up with this:
Id | Prod | Acc | Val
----------------------------------------------
1 | ABC-AB12 | ABC1 | 1
2 | ABC-AB12 | DEF1 | 2
3 | ABC-AB12 | GHI1 | A
4 | DEF-AB12 | ABC1 | 1
5 | DEF-AB12 | DEF1 | 2
6 | DEF-AB12 | GHI1 | A
7 | GHI-AB12 | ABC1 | 1
8 | GHI-AB12 | DEF1 | 2
9 | GHI-AB12 | GHI1 | A
This you would do if the Id
looks likr this:
PRIMARY KEY (Id)
OK, great. Now here is the added bonus: If you make the PRIMARY KEY look like this:
PRIMARY KEY (Prod,Id)
the data can be stored like this:
Id | Prod | Acc | Val
----------------------------------------------
1 | ABC-AB12 | ABC1 | 1
2 | ABC-AB12 | DEF1 | 2
3 | ABC-AB12 | GHI1 | A
1 | DEF-AB12 | ABC1 | 1
2 | DEF-AB12 | DEF1 | 2
3 | DEF-AB12 | GHI1 | A
1 | GHI-AB12 | ABC1 | 1
2 | GHI-AB12 | DEF1 | 2
3 | GHI-AB12 | GHI1 | A
How is that possible? Only the MyISAM Storage Engine Has This Mechanism Built In !!!
I have discussed this before:
One more thing: Why have PRIMARY KEY (Prod,Id)
as a PRIMARY KEY? This would allow you to sequence each Product ID. Thus, you can look for sequence 3 of one product and sequence 3 of another product.
EPILOGUE
Whichever way you decide to go, using a smaller autoincrement PRIMARY KEY (4 bytes) make more sense for performance and diskspace than a larger PRIMARY KEY (more than 4 bytes).
Give it a Try !!!
Best Answer
As Kumar said, using a composit primary key will prevent you from having two entries with the same userID and activityID - you have to decide if this is the desirable configuration. Logically, I would say that a user can complete the same activity twice, so the "completed" list would have two entries, both with the same userID and activityID. In this case you would need a seperate primary key.
Generally I would recommend using the activityID as a foreign key. If you configure
ON DELETE CASCADE
the deletion of an activity will also delete all entries in both lists that refer to this activity. However I would rather recommend usingON DELETE RESTRICT
which would prohibit the deletion of an activity as long as there is a corresponding entry in either list (this will prevent you from having invalid lists where an activityID cannot be resolved).