Trying to understand how an auto increment primary key is better than no primary key and some other primary key questions

primary-key

I am trying to understand primary keys better, how to use them effectively in table design and in queries.

First, are the primary keys themselves used in WHERE clauses? For example, if I have a table of names and the primary key is setup as 'A' for all entries with last name starting with 'A', 'B' for all last names starting with 'B', etc. would it be best practice to have something like:

WHERE pk_field = 'B' AND last_name = 'Bluthe'

Secondly, I want to understand how an auto increment primary key would be better for performance than just something like:

SELECT last_name FROM names WHERE last_name = 'Bluthe'

If the primary key for this record is 1247 would a

WHERE pk_field = 1247

be much better. Wouldn't the quest still go through each record in that column until it finds a match.

Best Answer

Firstly, I agree with John M's comment to your question. You should do some reading about the concepts of database.

You probably misinterpret between accessing your data and how your data organized in the database. When you need to find names whose last_name equals to 'Bluthe', just do it:

SELECT last_name FROM names WHERE last_name = 'Bluthe'

That's it. You do not need to worry about the primary key. The thing you should worry of is whether the last_name is indexed.

When we talk about index, then the primary key comes in. Primary key is the way to address a row in a table. Hence, PK should be unique. The way you define primary key simply won't work. Index will be build on top of PK. Let's say you have names table as follows, assuming using integer as pk

names
--------------------------------------------------
pk_field  first_name        last_name
--------------------------------------------------
1         John              Doe
2         Will              Smith
3         James             Bluthe
4         Nick              Smith

Then we create index on last_name. The index created will be like this:

--------------------------------------------------
last_name     pk_field
--------------------------------------------------
Bluthe        3
Doe           1
Smith         2
Smith         4

Upon searching last_name, database will utilize this index to locate the actual record. Searching 'Bluthe' will come up with pk_field = 3, which is then used to fetch the actual record from names table.

For me, the best practice for primary key is cheap to compare (using integer based data types), clustered with auto-increment values. With auto-increment value we do not have to worry when more than one transaction inserting to the same table at the same time. Clustered primary key will arrange the physical row in the table according to its primary key value. Thus searching based on primary key will be very fast.

HTH