PostgreSQL – How to Use a Foreign Key

application-designdatabase-designforeign keypostgresql

I'm creating the following table:

CREATE TABLE fund_identifier
(
    id BIGSERIAL PRIMARY KEY NOT NULL,
    identifier TEXT NOT NULL
);
CREATE UNIQUE INDEX IF NOT EXISTS fund_identifier_pkey ON fund_identifier (id);
CREATE UNIQUE INDEX IF NOT EXISTS fund_identifier_identifier_uindex ON fund_identifier (identifier)

However, I'm not sure what would be the best practice to use identifier as a foreign key. Should I use as reference the value of the foreign key like this:

    CREATE TABLE security_papers (
      id                                        BIGSERIAL,
      fund_identifier                           TEXT,
      as_of_date                                DATE,
          ...
      CONSTRAINT security_papers_fund_identifier_fk FOREIGN KEY (fund_identifier) REFERENCES fund_identifier (identifier)
    }

Or, should I use its id? Like this:

    CREATE TABLE security_papers (
      id                                        BIGSERIAL,
      fund_identifier_id                        BIGSERIAL,
      as_of_date                                DATE,
          ...
      CONSTRAINT security_papers_fund_identifier_fk FOREIGN KEY (fund_identifier) REFERENCES fund_identifier (identifier)
    }

My guess is on the second approach when it comes to the normalization rules. On the logical side, my application would need to look for the id of fund_identifier first, before persisting anything at security_papers table. Right ?

But if I go with the second approach, this would be delegated to the database. Making my application logic, easier to implement.

Please let me know your thoughts, and if I'm missing some concept here. Thank you!

Best Answer

Your table seems to have a perfectly good key candidate (identifier) yet you also create a surrogate key. Why? (I do not hold to the rule that all tables must have a surrogate key.)

However, once you create a surrogate key, that is generally the field used to reference rows in that table. Even when importing outside data that contains the text value in identifier, it is converted to the key value when stored.

In other words, almost without exception, the only place you will find the text identifier values will be in the identifier field of the fund_identifier table. This eliminates ambiguous data and simplifies maintenance.

Related Solutions

Database Normalization – Foreign Key vs Varchar Field

The design choices you describe are not directly related to normalization.

I agree there should be a lookup table.

I think an OrderStatusID value would increase redundancy. The status (text) value presumably already satisfies many of the qualities of a good key: unique, stable, narrow, familiar to users, etc. Referential integrity can be applied to VARCHAR columns, of course! Each application that uses the key can assign it a enum as required and would be responsible for mapping enum values to status (text) values. This would presumably make the lookup table a single column, 'all-key' table (and therefore would satisfy 6NF, the highest normal form ;)

[If OrderStatusID is an attribute in the Order table then it would not be in 6NF but, as I say, I don't think you are actually asking about normalization at all.]

PostgreSQL – insert/update violates foreign key constraints

There are a few problems with your tables. I'll try to address the foreign keys first, since you question asked about them :)

But before that, we should realize that the two sets of tables (the first three you created and the second set, which you created after dropping the first set) are the same. Of course, the definition of Table3 in your second attempt has syntax and logical errors, but the basic idea is:

CREATE TABLE table3 (   
  "ID" bigint NOT NULL DEFAULT '0',   
  "DataID" bigint DEFAULT NULL,   
  "Address" numeric(20) DEFAULT NULL,   
  "Data" bigint DEFAULT NULL,
   PRIMARY KEY ("ID"),   
   FOREIGN KEY ("DataID") REFERENCES Table1("DataID") on delete cascade on update cascade,   
   FOREIGN KEY ("Address") REFERENCES Table2("Address") on delete cascade on update cascade
);

This definition tell PostgreSQL roughly the following: "Create a table with four columns, one will be the primary key (PK), the others can be NULL. If a new row is inserted, check DataID and Address: if they contain a non-NULL value (say 27856), then check Table1 for DataID˙and Table2 for Address. If there is no such value in those tables, then return an error." This last point which you've seen first:

ERROR: insert or update on table "Table3" violates foreign key constraint 
    "Table3_DataID_fkey" DETAIL: Key (DataID)=(27856) is not present in table "Table1".

So simple: if there is no row in Table1 where DataID = 27856, then you can't insert that row into Table3.

If you need that row, you should first insert a row into Table1 with DataID = 27856, and only then try to insert into Table3. If this seems to you not what you want, please describe in a few sentences what you want to achieve, and we can help with a good design.

And now about the other problems.

You define your PKs as

CREATE all_your_tables (
    first_column NOT NULL DEFAULT '0',   
    [...]
    PRIMARY KEY ("ID"),

A primary key means that all the items in it are different from each other, that is, the values are UNIQUE. If you give a static DEFAULT (like '0') to a UNIQUE column, you will experience bad surprises all the time. This is what you got in your third error message.

Furthermore, '0' means a text string, but not a number (bigint or numeric in your case). Use simply 0 instead (or don't use it at all, as I written above).

And a last point (I may be wrong here): in Table2, your Address field is set to numeric(20). At the same time, it is the PK of the table. The column name and the data type suggests that this address can change in the future. If this is true, than it is a very bad choice for a PK. Think about the following scenario: you have an address '1234567890454', which has a child in Table3 like

ID        DataID           Address             Data
123       3216547          1234567890454       654897564134569

Now that address happens to change to something other. How do you make your child row in Table3 follow its parent to the new address? (There are solutions for this, but can cause much confusion.) If this is your case, add an ID column to your table, which will not contain any information from the real world, it will simply serve as an identification value (that is, ID) for an address.

Best Answer

Related Solutions

Database Normalization – Foreign Key vs Varchar Field

PostgreSQL – insert/update violates foreign key constraints

Related Question