Ok, I am making a lot of assumptions (INT instead of VARCHAR(50) being one of them) with this answer, so feel free to correct me if needed. The problem with option B is that it introduces a new join to relate Users to Alerts without any real added benefit. If joining on the UserID, it is best to index the UserID, so you can utilize seeks for your joins.
For Option A, UserID will be the clustering key (index key for the clustered index) on the Users table. UserID will be a nonclustered index key on Alerts table. This will cost 16 bytes per Alert.
For Option B, UserID will be the clustering key on the Users table. UserId will probably be the clustering key in UserMap too, to make joining more efficient. UserKey (assuming this is an INT) would then be a nonclustered index key on the Alerts table. This will cost 4 bytes per Alert. And 20 bytes per UserMap.
Looking at the big picture, one relationship, for Option A, costs 16 bytes of storage, and involves 1 join operation. Whereas, one relationship, for Option B, costs 24 bytes of storage, and involves 2 join operations.
Furthermore, there are a possibility of 340,282,366,920,938,000,000,000,000,000,000,000,000 uniqueidentifiers and only 4,294,967,296 INTs. Implementing a uniqueidentifier to INT map for a this type of relationship could cause unexpected results when you start reusing INTs.
The only reason for creating this type map table, is if you plan on creating a Many to Many relationship between Users and Alerts.
Taking all of this into consideration, I would recommend Option A.
I hope this helps,
Matt
From your question, I'm not completely sure if you're inserting multiple values per query, but you definitely should be. With MyISAM as the engine, there should not be a performance difference between the index being UNIQUE
or PRIMARY
; MyISAM doesn't treat them differently in this case. If you were using InnoDB, however, there would be a difference since it stores the data in primary key order. If you don't need the id
column, removing it and making domain
the primary key would help performance.
Changing the collation should help since ascii
is much simpler than utf8
, but you might want to use ascii_general_ci
instead of ascii_bin
since domain names are case-insensitive.
One other way to do the queries would be to get the number of rows, INSERT DELAYED
, flush the delayed writes, and then get the new row count. The difference in the counts would be the same as the affected rows. However, I don't think this would be significantly faster, but it would make the process more complex.
Best Answer
The two tables
person
andpraktikant
in your example are two distinct tables. Inserting rows intopraktikant
does not insert rows inperson
, butis effectively the same as
Look at the
EXPLAIN
output for both queries to confirm that.Each of these tables has its own constraints and indexes, which don't affect the other table.
So the answers to your questions would be:
The tables are completely independent, except that
praktikant
must have all columns ofperson
querying
person
will include rows frompraktikant
That question is not quite clear, but I assume you can answer it yourself from the above. Locks are taken on the rows of each table independently.
Not at all. Also, a constraint on one table does not affect the other table. Specifically, you cannot guarantee uniqueness of entries across the two tables.
There are few real-world uses. Before the advent of declarative partitioning in v10, it was used to implement partitioning in PostgreSQL.