Sql-server – Update many-to-many relationship table

performancesql serversql-server-2008-r2

I have three tables: task, taskmembers, members. Each task may have up to five people associated to it. Currently, when someone updates a task I delete all the current members and insert them as a new record. Otherwise, I have to figure out if the person is still a member of the task and, if not, delete them. Also, I have to delete the people no longer associated with the task and update the people still associated to the task.

My concern is that I'm currently using an int as my primary key within the members table. I know MS SQL int values can go up to 2,147,483,647 before overflowing, but it's still a concern of mine. Also, is deleting all the records a good practice? Thank you in advance.

Best Answer

The first thing you'll want to do is make sure there is an alternate key (usually via a unique constraint) defined in the taskmembers table, that consists of the task id and member id. This key will drive all data manipulation, and the surrogate key won't be involved at all.

Once you have that key in place, I suggest using the MERGE statement to perform data manipulation. This will avoid having to delete a subset of data and re-insert it. Instead, by matching the source and target data based on the alternate key, incremental insert/updates can take place. The MERGE statement also allows you to delete from the target where records don't exist in the source (i.e., a member of the task was removed).

The DELETE/INSERT pattern is really wasteful of resources because of the extra amount of logging involved, and the unnecessary locks that are required while the data manipulation occurs. If this is a "hot" table, this kind of pattern can easily lead to deadlocks (if not using snapshot isolation), and as you're finding out, wasted key space usage. With the MERGE pattern, new keys would only be required when members are actually added to tasks.

Related Solutions

SQL Server – Best Practices for Safely Deleting a Database Permanently

You also want to make sure of the datetime stamps of every table. Search for any metadata in the system for every table, order such a list by datetime last updated, and display the output in desc order by datetime. You could also check the table size for even the slight change in size.

For example, in MySQL 5.x, you have information_schema.tables which looks like this:

mysql> desc information_schema.tables;
+-----------------+---------------------+------+-----+---------+-------+
| Field           | Type                | Null | Key | Default | Extra |
+-----------------+---------------------+------+-----+---------+-------+
| TABLE_CATALOG   | varchar(512)        | NO   |     |         |       |
| TABLE_SCHEMA    | varchar(64)         | NO   |     |         |       |
| TABLE_NAME      | varchar(64)         | NO   |     |         |       |
| TABLE_TYPE      | varchar(64)         | NO   |     |         |       |
| ENGINE          | varchar(64)         | YES  |     | NULL    |       |
| VERSION         | bigint(21) unsigned | YES  |     | NULL    |       |
| ROW_FORMAT      | varchar(10)         | YES  |     | NULL    |       |
| TABLE_ROWS      | bigint(21) unsigned | YES  |     | NULL    |       |
| AVG_ROW_LENGTH  | bigint(21) unsigned | YES  |     | NULL    |       |
| DATA_LENGTH     | bigint(21) unsigned | YES  |     | NULL    |       |
| MAX_DATA_LENGTH | bigint(21) unsigned | YES  |     | NULL    |       |
| INDEX_LENGTH    | bigint(21) unsigned | YES  |     | NULL    |       |
| DATA_FREE       | bigint(21) unsigned | YES  |     | NULL    |       |
| AUTO_INCREMENT  | bigint(21) unsigned | YES  |     | NULL    |       |
| CREATE_TIME     | datetime            | YES  |     | NULL    |       |
| UPDATE_TIME     | datetime            | YES  |     | NULL    |       |
| CHECK_TIME      | datetime            | YES  |     | NULL    |       |
| TABLE_COLLATION | varchar(32)         | YES  |     | NULL    |       |
| CHECKSUM        | bigint(21) unsigned | YES  |     | NULL    |       |
| CREATE_OPTIONS  | varchar(255)        | YES  |     | NULL    |       |
| TABLE_COMMENT   | varchar(2048)       | NO   |     |         |       |
+-----------------+---------------------+------+-----+---------+-------+
21 rows in set (0.01 sec)

The column UPDATE_TIME records the last time any INSERT, UPDATE, or DELETE was last applied to the table. You could run queries like these to find out when each database was last accessed:

Last time a table was accessed in each database:

SELECT table_schema,MAX(update_time) last_accessed
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema','mysql')
AND update_time IS NOT NULL
GROUP BY table_schema;

Last time a table was accessed in any database:

SELECT MAX(update_time) last_accessed FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema','mysql');

Last 10 dates a table was accessed:

SELECT * FROM
(SELECT * FROM
(SELECT last_accessed,COUNT(1) access_count
FROM (SELECT DATE(update_time) last_accessed
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema','mysql')
AND update_time IS NOT NULL) A
GROUP BY last_accessed) AA
ORDER BY last_accessed DESC) AAA
LIMIT 10;

These are just a few examples of how to get such metadata from MySQL. I'm sure Oracle and SQL Server have similar or better methods.

Once you are sure of how often or seldom a database (or schema) is accessed, you should manually dump/export aged databases along with copies of the schema itself apart from the data. Please excuse that my answer is not DB agnostic. SQLServer and Oracle DBAs should voice their answers here as well, since the concept of a schema being a collection within a database instance is blurred in MySQL but very strictly followed in SQLServer and Oracle.

Sql-server – Nested queries or temp (#) tables in SQL Server

The SQLCat team ideas and articles are sometimes pitched at edge cases that most folk will never see given their load or database size.

Saying that, the ideas they give there are common techniques to improve performance (eg change OR to UNION)

Temp tables have some issues around stored procedure compilation etc, but don't confuse these with table variables (the SQLCat articles mentions this). Also, temp tables should be local not global to separate processes don't affect each other

Personally, I use temp tables quite often to break queries down: but not all the time. If I can do it in one SQL statement that runs well enough (for it's frequency of use) then I'll use that.

In your case, I'd identify a few problem queries and see if using temp tables suits these better. If you can't see any problem queries then do nothing.

Best Answer

Related Solutions

SQL Server – Best Practices for Safely Deleting a Database Permanently

Sql-server – Nested queries or temp (#) tables in SQL Server

Related Question