There are a few problems with your tables. I'll try to address the foreign keys first, since you question asked about them :)
But before that, we should realize that the two sets of tables (the first three you created and the second set, which you created after dropping the first set) are the same. Of course, the definition of Table3
in your second attempt has syntax and logical errors, but the basic idea is:
CREATE TABLE table3 (
"ID" bigint NOT NULL DEFAULT '0',
"DataID" bigint DEFAULT NULL,
"Address" numeric(20) DEFAULT NULL,
"Data" bigint DEFAULT NULL,
PRIMARY KEY ("ID"),
FOREIGN KEY ("DataID") REFERENCES Table1("DataID") on delete cascade on update cascade,
FOREIGN KEY ("Address") REFERENCES Table2("Address") on delete cascade on update cascade
);
This definition tell PostgreSQL roughly the following: "Create a table with four columns, one will be the primary key (PK), the others can be NULL
. If a new row is inserted, check DataID
and Address
: if they contain a non-NULL value (say 27856), then check Table1
for DataID
˙and Table2
for Address
. If there is no such value in those tables, then return an error." This last point which you've seen first:
ERROR: insert or update on table "Table3" violates foreign key constraint
"Table3_DataID_fkey" DETAIL: Key (DataID)=(27856) is not present in table "Table1".
So simple: if there is no row in Table1
where DataID = 27856
, then you can't insert that row into Table3
.
If you need that row, you should first insert a row into Table1
with DataID = 27856
, and only then try to insert into Table3
. If this seems to you not what you want, please describe in a few sentences what you want to achieve, and we can help with a good design.
And now about the other problems.
You define your PKs as
CREATE all_your_tables (
first_column NOT NULL DEFAULT '0',
[...]
PRIMARY KEY ("ID"),
A primary key means that all the items in it are different from each other, that is, the values are UNIQUE
. If you give a static DEFAULT
(like '0'
) to a UNIQUE
column, you will experience bad surprises all the time. This is what you got in your third error message.
Furthermore, '0'
means a text string, but not a number (bigint
or numeric
in your case). Use simply 0
instead (or don't use it at all, as I written above).
And a last point (I may be wrong here): in Table2
, your Address
field is set to numeric(20)
. At the same time, it is the PK of the table. The column name and the data type suggests that this address can change in the future. If this is true, than it is a very bad choice for a PK. Think about the following scenario: you have an address '1234567890454', which has a child in Table3
like
ID DataID Address Data
123 3216547 1234567890454 654897564134569
Now that address happens to change to something other. How do you make your child row in Table3
follow its parent to the new address? (There are solutions for this, but can cause much confusion.) If this is your case, add an ID column to your table, which will not contain any information from the real world, it will simply serve as an identification value (that is, ID) for an address.
Your indexes are fine for the two types of queries you mentioned.
This query will be satisfied by traversing the clustered index on the primary key...
[...] WHERE participant_id = x AND question_id = y AND given_answer_id = z;
...and this one is satisfied by the index on 'question_id':
[...] WHERE question_id = x;
The output of EXPLAIN SELECT
is not telling you what you think it is telling you, because the value shown in rows
is an estimate of the number of rows the server will need to consider, not the actual rows it will examine. For InnoDB
these are based on index statistics.
rows
The rows column indicates the number of rows MySQL believes it must examine to execute the query.
For InnoDB tables, this number is an estimate, and may not always be exact.
— http://dev.mysql.com/doc/refman/5.5/en/explain-output.html#explain_rows
The optimizer gathers information about different possible query plans, and chooses the one with the lowest cost. The information shown in EXPLAIN
is the information the optimizer gathered about the plan it selected.
When type
is ref
and key
is not NULL
, this means that the name listed in the key
column is the name of the index that the optimizer has chosen to use to find the desired rows, so your query plan looks exactly as it should.
Note, sometimes you will see Using index
in the Extra
column and a lot of people assume that this means an index is being used, or that no index is being used when that doesn't appear, but that's not correct, either. Using index
describes a special case called a "covering index" -- it does not indicate whether an index is being used to locate the rows of interest.
It's possible that running ANALYZE [LOCAL] TABLE
would cause the numbers in rows
shown by EXPLAIN
to differ, but this is a simple query and selecting this index is an obvious choice for the optimizer to make, so ANALYZE TABLE
is unlikely to make any actual difference in performance.
It is possible, however, that your overall performance might see some marginal improvement with an occasional OPTIMIZE [LOCAL] TABLE
, because you are not inserting rows in primary key order (as would be the case with an auto_increment
primary key)... but on large tables this can be time-consuming because it rebuilds a new copy of the table... but, again, I wouldn't expect any significant change.
Best Answer
Use
bigint
values generated by a single sequence. Numbers are quite easy to say over the telephone.Forget about the requirement of enforcing database-wide uniqueness. Unless someone manually messes with the data, the sequence will guarantee the requirement. The performance cost of enforcing such a requirement with database means would greatly outweigh its usefulness.
An alternative would be to have a unique alphabetic prefix for each table that you store in a "table of tables". Then each table has its own sequence, and you generate the primary key in a
BEFORE INSERT
trigger by concatenating the table's prefix with the sequence value. That would be slighly more expensive, but would lead to more pronouncable names.