The PRIMARY KEY, or any index for that matter, would be accessed much faster if the length of the PRIMARY KEY was smaller. It is easier to put a 4-byte integer as a unique identified for a fullpath image name than the fullpath filename (of various and ridiculous lengths).
Think of the Clustered Index, where the PRIMARY KEY would reside. Row data will occupy a Clustered Index. In MySQL, the Clustered Key would be coupled with other columns in a nonunique index. Wouldn't a smaller datatype (4 bytes) just make more sense? Otherwise, indexes can blow up at a rate of O(n log n).
To create a unique number for each image, you need a table that resembles something like this:
CREATE TABLE images
(
image_id INT NOT NULL AUTO_INCREMENT,
image_name VARCHAR(255),
image_folder VARCHAR(255),
PRIMARY KEY (image_id),
UNIQUE KEY name_folder (image_name,image_folder),
KEY folder (image_folder)
);
This design
- allows for multiple files with the same image name, each located in a different folder
- denies having two files with the same image name in the same folder
From here, just INSERT and retrieve the created image_id as follows:
INSERT IGNORE INTO images (image_name,image_folder)
VALUES ('someimage.jpg','/some/linux/folder');
SELECT image_id INTO @imageid FROM images
WHERE image_name='someimage.jpg'
AND image_folder='/some/linux/folder';
Doing it this way may lose some image_id numbers along the way. You may want to do this:
SELECT COUNT(1) INTO @image_is_there FROM images
WHERE image_name='someimage.jpg'
AND image_folder='/some/linux/folder';
IF @image_is_there IS 0, then
INSERT INTO images (image_name,image_folder)
VALUES ('someimage.jpg','/some/linux/folder');
The order of columns in a primary key does not affect the insert performance, since the combination of the values in the two columns of the primary key are meant to be unique.
However the order of index impacts the performance of SELECT queries. In the new order you will have a greater and increasing range of values (dates) across which the index is clustered (dates) as compared to clustering around 100 stores.
If most of your queries are focused on a single store with the new ordering your queries will be slower.
Best Answer
It depends on the specific scenario.
For example, if you have a many-to-many relationship, then it is common to have a join table between the two tables. The join table will contain a foreign key to each of the primary keys and most of the time, the combination of the two foreign keys will become the primary key:
For the most part, I prefer natural keys rather than surrogate keys - especially if I will never use the surrogate key to ever query the table.
With that being said, to answer your question directly:
Firstly, can a string be a primary key? Though the name may be its natural key, you have to take into consideration the size of the key if you made it the primary key. Not just in your main table, but also in the tables that refer to it. Remember, that in order to create a foreign key you must copy the key value into any dependant tables - this can quickly bloat your database and slow down queries depending on the size and type of data. For instance, short fixed-length text columns can perform better than variable-length text columns and can be indexed easier than longer text values. A good example of this that I use each day is a currency table. My primary key is a 3 character currency code - it is short, of a fixed length, and I am always going to query the table using it even if the table had a surrogate key defined on it.
Secondly, does the product id need to be part of the productions primary key? Well, this depends on a few factors. If you made the name part of the primary key, and the name was short then potentially, yes it could. But, does any other table have a reference to the productions table? If so, then by adding the product id into the tables primary key you would also need to add it into all of the foreign keys that refer to it. Personally, I don't like this. I am happy to do it if I can guarantee that no table will ever need a foreign key to it (such as a fact table in a dimensional model) but otherwise I think it is wasteful and makes querying your tables over complicated.
You should also consider that if you define further non-clustered indexes on the production table when you have a composite key, the row locator for the non-clustered index is a copy of the primary key itself. This can cause all of your non-clustered indexes to bloat out of control very quickly without your realising it. For reference: Clustered and Nonclustered Indexes Described
I think given your particular scenario, it is perfectly reasonable to add a foreign key in your production table which refers to the id of the product that is produced. As Gulrez Khan stated in his answer, you may want to add a non clustered index on the foreign key - it is not required, but can enhance performance in some queries.
Remember that all indexes need to be built, and rebuilt when they are modified. The more fields you cram into them, the longer it takes to perform actions against them. Especially clustered indexes, which are generally defined upon the primary key. These indexes physically sort your data at the storage level. If you insert a value that happens to not be at the end of the existing chain of values, you could cause performance problems due to page splits. For me, this is the biggest advantage of a surrogate key over a natural key - the ability to ensure that new data is always appended and not injected into the middle of an existing index.
Finally, if in doubt - try it out! There is no substitution for a good testing strategy. Try querying the data against both structures, look at the query plans, do some capacity planning and choose the one that is right for you.