Trying to understand how an auto increment primary key is better than no primary key and some other primary key questions

primary-key

I am trying to understand primary keys better, how to use them effectively in table design and in queries.

First, are the primary keys themselves used in WHERE clauses? For example, if I have a table of names and the primary key is setup as 'A' for all entries with last name starting with 'A', 'B' for all last names starting with 'B', etc. would it be best practice to have something like:

WHERE pk_field = 'B' AND last_name = 'Bluthe'

Secondly, I want to understand how an auto increment primary key would be better for performance than just something like:

SELECT last_name FROM names WHERE last_name = 'Bluthe'

If the primary key for this record is 1247 would a

WHERE pk_field = 1247

be much better. Wouldn't the quest still go through each record in that column until it finds a match.

Best Answer

Firstly, I agree with John M's comment to your question. You should do some reading about the concepts of database.

You probably misinterpret between accessing your data and how your data organized in the database. When you need to find names whose last_name equals to 'Bluthe', just do it:

SELECT last_name FROM names WHERE last_name = 'Bluthe'

That's it. You do not need to worry about the primary key. The thing you should worry of is whether the last_name is indexed.

When we talk about index, then the primary key comes in. Primary key is the way to address a row in a table. Hence, PK should be unique. The way you define primary key simply won't work. Index will be build on top of PK. Let's say you have names table as follows, assuming using integer as pk

names
--------------------------------------------------
pk_field  first_name        last_name
--------------------------------------------------
1         John              Doe
2         Will              Smith
3         James             Bluthe
4         Nick              Smith

Then we create index on last_name. The index created will be like this:

--------------------------------------------------
last_name     pk_field
--------------------------------------------------
Bluthe        3
Doe           1
Smith         2
Smith         4

Upon searching last_name, database will utilize this index to locate the actual record. Searching 'Bluthe' will come up with pk_field = 3, which is then used to fetch the actual record from names table.

For me, the best practice for primary key is cheap to compare (using integer based data types), clustered with auto-increment values. With auto-increment value we do not have to worry when more than one transaction inserting to the same table at the same time. Clustered primary key will arrange the physical row in the table according to its primary key value. Thus searching based on primary key will be very fast.

HTH

Related Solutions

Why do primary keys have names of their own

Primary keys (and other unique constraints) are implemented as indexes, and are dealt with in exactly the same way - it doesn't make sense from the programmer's point of view to have separate code paths for PKs and indexes (it would double up the potential for bugs).

Other than being referred to by foreign keys, a PK is just a unique constraint which is in turn implemented as an index, so changing the properties of a PK is just the same as changing the properties of any other index. Also having an explicit name means they can be referred to in query hints like any other index.

Serial field issue

In Informix, a SERIAL column is a 4-byte signed integer with some auto-increment properties.

If you insert the value 0 into a SERIAL column, or fail to provide a value for the serial column, the next number in ascending order will automatically be assigned for this row. In a programming language such as ESQL/C, you can retrieve the value inserted from the SQLCA record. There are also, I think, functions to retrieve the last serial value inserted. If you insert a value into a SERIAL column, that is the value that will be use. Clearly, if there is a unique constraint on the column (there isn't one automatically, though the DB-Access Schema Editor will add one for you) and you insert a duplicate record, then the insertion will fail. If there is no unique constraint, it will succeed. If the newly inserted value, N, is larger than any previously inserted, then the internal counter will be incremented so that the next row inserted with 0 will be assigned N+1.

The second part of the question asks:

What will happen when the SERIAL value reaches the maximum?
- The answer is that it wraps around to 1 again. The new values will be inserted, and as long as the new value does not collide with a value already in the table, all will be well.
- Note that negative and zero values are all skipped. You can insert negative values explicitly. AFAIK, you cannot insert a value of zero into a SERIAL column¹.

This leads to an answer for the first part of the question:

Is there any method to initialize the serial again to zero?
- There is. You insert 2³¹-1 into the serial column. The next row you insert will be assigned the value 1 (not zero). It will keep going from there. Note, however, that if there are many values from the first cycle still in the table, this will cause problems; insertions will fail because the value is already in use.
- Once upon a very long time ago, in versions that are, I believe, safely out of service (we're talking early or mid 1990s here), you had to insert 2³¹-2 and then 2 rows with zeroes. If you inserted 2³¹-1 directly, the system got confused and jammed itself.

¹ You can get a row where there is a zero in the SERIAL column, but only through a back-door cheat. You have to have the column as a plain INTEGER column when you insert the row; then you alter the table so the column is a SERIAL column. So, as I said, you can't insert a row with zero in the SERIAL column, but you can find a row with zero in the SERIAL column if someone is devious enough.

Demonstration

SQL commands prefixed with a + and a blank; output from the DBMS without that prefix. Note that this table was created without a unique constraint or primary key on the serial column; this is not what I'd normally do.

+ begin work;
+ create table serial_x(s serial(100) not null);
+ insert into serial_x(s) values(0);
+ insert into serial_x(s) values(0);
+ insert into serial_x(s) values(0);
+ select * from serial_x;
100
101
102
+ delete from serial_x;
+ alter table serial_x modify(s serial(1));
+ insert into serial_x(s) values(0);
+ select * from serial_x;
103
+ alter table serial_x modify(s serial(1000));
+ insert into serial_x(s) values(0);
+ select * from serial_x;
103
1000
+ insert into serial_x values(2147483647);
+ select * from serial_x;
103
1000
2147483647
+ insert into serial_x values(-1);
+ insert into serial_x values(-1);
+ insert into serial_x values(0);
+ select * from serial_x;
103
1000
2147483647
-1
-1
1
+ rollback;

I formally checked that ALTER ... SERIAL(0) produces the same result as ALTER ... SERIAL(1).

Best Answer

Related Solutions

Why do primary keys have names of their own

Serial field issue

Demonstration

Related Question