I am working with an application for students management. For the student's table I used a surrogate key (auto increment id). The problem I face with the surrogate key is when the user wants to show information about a student he must enter the id but the user knows only the name of the student (and I can't use the name as a primary key because it's not unique). How can I solve this problem?
Sql-server – Problem with a surrogate key
sql serversurrogate-key
Related Solutions
This is an ongoing debate that many face, and experts will give good arguments for either of these. What your question boils down to is a surrogate vs. natural key (and option two seems to be a half-surrogate half-natural key).
As for your third option, oftentimes I see people do whatever they can to find a unique candidate for a primary key constraint. Take you for instance: name is your example key. That is a notoriously bad candidate for a key, as there is no uniqueness guaranteed with that. Think "John Smith". Not only that, but having that as a key will bloat your foreign key relationship, as when you go to reference that primary key in a relationship table then you will have that same data type and column width. Think nvarchar() as opposed to a narrow data type (i.e. int).
As for your second option, it sounds like you have the ability to create a unique value (by your description, able to fit in an int column). Where would you plan on doing this? This would either be logic in the database or in the application. In my opinion, I like to keep entities like keys isolated to the data tier. This "id" would most likely be used for normalized data. That should be an abstraction to the application (in my opinion). So now my reasoning brings me to either a database-generated unique value (your option two) or an identity value (your option one). Those both have the same result (as far as I can see it), therefore why go through the trouble and the development to generate your own when you can just have an IDENTITY
column?
Again, others may and probably will have different and equally objective answers for going with another alternative, but my vote would be for the IDENTITY
column for a unique, surrogate key.
On tables with a large number of rows being inserted having a ever-increasing clustered index generally improves write performance as it ensures that new records are added together "at the end" of the table. (Note that clustered indexes and primary keys are not the same thing, you can actually have a different primary key from your clustered index. Its the clustered index which matters here, not the primary key).
If you have another auto-incrementing column in your table then you could use this as your clustered index and keep your primary key the same, however your clustered index needs to be unique (so you can't use a DATETIME), which generally means an auto-incrementing column, which may as well be your primary key.
That said, you should run some performance / stress tests and try it for yourself. In the past I worked on a system which intentionally clustered on a GUID on a frequently updated table, the theory being that spreading out the writes actually reduced lock contention and improved performance (I wasn't part of that change so to be honest I was always kind of skeptical). Try it out and see which is faster with your data.
To answer your second question, looking up a user by its username should be very fast with proper indexing - its possible that clustering the the username could improve performance, however its going to be very marginal.
Best Answer
There are a few things of interest here (note: I currently work in this field, so I'm not going to go into too much depth with regards to system functionality and requirements).
Avoid exposing database internal identifiers to the outside world. The field values in an integer
IDENTITY
column have no meaning outside the database. The absolute farthest they should go is the client-side data access layer.Normally, students (people) are identified within a jurisdiction by a unique number (i.e., student number) assigned by the jurisdictional authority (i.e., department of education). In some jurisdictions, students (people) need to exist in the system without a unique number because the numbers aren't assigned synchronously as students (people) are registered. I don't know if you need to deal with that or not. If that is the case, students (people) without the unique number are determined to be unique by a business key, such as the combination of first name, last name, gender and birth date. Everyone, though, will have a unique surrogate key value in the database, which you (not the user) will use to uniquely identify a record.
You will have to implement functionality to search for students. Most of the time users of the system will be searching based on a name (or partial name), not a student number. The resulting search should display students to which the user "has access." For example, if the user is a teacher, they should only have access to the students they teach. The search interface will only allow the user to select a single student -- which will give you the internal identifier you need -- before proceeding.
So in summary, I think you're on the right path with the database design, it's just a matter of implementing some extra functionality on the front end for the user to be able to uniquely identify a student.