Postgresql – Best practices for generating unique multi-column keys for weak entities

auto-incrementdatabase-designpostgresqltrigger

How should one generate non-unique, non-natural identifiers for weak entities?

For example, if order_id is the primary key for an order table, and (order_id, item_number) is the primary key for an order_item table with a foreign key on order_id, how best to generate item_number?

A few of possibilities come to mind, but none seem ideal:

Autoincrement item_number: the order_item entity is no longer weak, and the composite key is redundant.
Use a trigger to search for the current max item_number for a given order_id, then increment: if a row is deleted this could lead to reassigning a PK to a different record – that doesn't seem like a good idea? (edit: this can also be done without using triggers as explained in joanolo's answer)
Use a trigger to create a new sequence for every order_id, and somehow draw item_numbers from the appropriate sequence: this is functionally the desired behaviour, but seems like it would be a mess to implement. Is it even possible to reference a sequence by an order_id?

Edit – closely related (if not duplicate):

Best Answer

1. is the least error-prone, simplest and fastest.
Trigger solutions like in 2. or 3. are subject to subtle race conditions under concurrent write access.

Make item_number a serial column and also the PK for order_item in this case. Stick with the default values drawn from the underlying sequence and never update the column.

Auto increment SQL function

Create a multi-column index on (order_id, item_number) for performance of typical queries. (Might as well be UNIQUE, but does not have to be.) In a typical setup (order_id and item_number can both be plain integer), the multicolumn index happens to be just as small and fast as an index on just order_id:

Is a composite index also good for queries on the first field?

(Like I commented:) Typically, the only important role of an item number is to be unique (and immutable). If you need a stable sort order among items, you might just rely on the serial value of item_number. Be aware that those numbers are not necessarily in order of transaction commits. It may be useful to add the transaction timestamp current_timestamp (or possibly statement_timestamp() or clock_timestamp()) to the row. Depends on requirements and access patterns.

You can add a VIEW for the human eye, with item-numbers per order_id starting from 1, dynamically generated with row_number(), ordered by above criteria. But operate with the unique, immutable item_number internally.

Create unique sequence per set of rows sharing the same foreign key value

Related Solutions

Mysql – Auto increment value needs to be changed inside trigger

I don't think you can get the new value of auto_id in a BEFORE insert trigger. This will do what you want (if I have understood correctly):

CREATE TRIGGER innodb_seqno_trigger 
   BEFORE INSERT ON my_table 
   FOR EACH ROW 
BEGIN

  SELECT MAX(seqno) INTO @newseqno 
  FROM my_table 
  WHERE id = NEW.id;

  SET NEW.seqno = COALESCE(@newseqno + 1, 1); 

END;

Test at SQL-Fiddle

But have you thought what will happen if you have 2 separate connections/threads/transactions that try to insert the same id?

Multiple foreign keys with shared columns for weak entities

Both solutions for T or T-alternate are considered denormalizations. Denormalizations optimize a datamodel that is normally in third normal form into a structure that is more convenient for select queries. The correct data already exists in CC and CD. T or T-alternate are duplicating the data. With duplicate (denormalized) data you have to make sure that data cannot get out of synch with its parent.

I think the first option is the better one. The constraints are simple and straight-forward. There is no check constraint on multiple fields with nulls. You need to make sure that any insert into CC or CD also inserts into T. After-insert triggers on CC and CD would automatically insert A,B,C,null into T or A,B,null,D into T.

One thing to plan for is that a parent has two children, each with many rows per parent. Lets assume a row in P has 3 rows in CC, 0 in CD. If you write a query over P inner joining to CC and CD, it shows zero rows. An left outer join (parent on left) shows 3 rows.

Lets assume CC has 3 rows and CD has 4 rows. Inner join and outer join queries would show 3*4 = 12 rows.

Lets assume a 3rd set of data. CC is the main child table with 20 rows per P. CD is a descriptive text field with 3 rows per P. You could reduce the resulting row to 20 per P with an aggregation function like Oracle 11g's LISTAGG over CD. This combines the D descriptive field into a comma separated list "Planes, Trains, Automobiles".

Having CCA and CDA fields in T-alternative increases the chances for data to get out of synch, since CCA might not equal CDA when something goes wrong in the code. The T-alternative is not the standard way of approaching this problem. The first option does have some challenges, but it is simpler, more standard, and more likely to hold up under production systems.

Best Answer

Related Solutions

Mysql – Auto increment value needs to be changed inside trigger

Multiple foreign keys with shared columns for weak entities

Related Question