Sql-server – Will a MERGE between two tables ever be faster than a DROP and SELECT INTO on the second table

data synchronizationmergeperformancequery-performancesql serversql-server-2016

Is there a scenario where it's faster for synchronizing the data from TableA to TableB by using a MERGE query as opposed to dropping TableB and doing a SELECT * INTO TableB FROM TableA?

Best Answer

Will a MERGE between two tables ever be faster than a DROP and SELECT INTO on the second table?

I think, in general, I'd say probably not. But it depends on a lot of factors.

Bulk Loading (minimal logging)

SELECT...INTO will benefit from bulk load optimizations. The INSERT portion of the MERGE query might benefit from these optimizations, but the UPDATE portion won't. So the MERGE statement will likely result in more transaction log writes, making it slower.

DROP vs DELETE

SELECT...INTO will also benefit from the fact that DROP TABLE is a metadata operation, and thus will "remove" all the rows very quickly. The DELETE portion of the MERGE query will be logged normally, resulting in more transaction log writes, making it slower.

Concurrency

SELECT...INTO will benefit from reduced locking / blocking / concurrency issues, because it will have exclusive access to the target table. The MERGE statement will potentially have to deal with lock escalation, blocking by other processes, etc, unless you provide TABLOCK hints.

Index Maintenance

The MERGE statement also has to deal with index maintenance (you mentioned there is a clustered index and a nonclustered index on the target table in this scenario), whereas the SELECT...INTO does not. To mitigate this, you could potentially drop the indexes and recreate them after the MERGE has run.

Additional thoughts

All of this somewhat depends on what the breakdown (in terms of # of rows) is between INSERTs, UPDATEs, and DELETEs in the MERGE statement, but in general I would expect the SELECT...INTO case to be quicker unless the MERGE statement has been designed very carefully.

Some related information that might be interesting to you and this situation:

Sidebar: in a comment, you mentioned:

I'm asking because in my case I have to synchronize about 300 million records, and with my testing of only 5 million records, the SELECT INTO took about 30 seconds, the MERGE took about 4 minutes. :(

To compare "apples to apples" make sure to include the time it takes to create the two indexes you mentioned for the SELECT...INTO case:

In my specific case, there's actually no indexes on TableB at the time of the SELECT INTO, but a single unique clustered (PK) and a single non-clustered index is created on it on after the data is inserted. If a MERGE query was used instead, then both tables would have unique clustered indexes and non-clustered indexes on them.

Related Solutions

Sql-server – Should an INSERT cause an exclusive lock on a foreign key

Here's my first recommendation to remove the deadlock (and this is a very common cause with the same resolution every time). Your RoutePlan_Save procedure currently has this logic:

IF EXISTS (SELECT ... WHERE key = @key)
  UPDATE ... WHERE key = @key;
ELSE
  INSERT(key) VALUES(@key);

So you actually have two queries here that require locks on the table, regardless of whether you are ultimately going to perform an insert or update. Stop doing that! What is the point of checking if a row exists, and then in a separate query, firing an update statement that again has to check if the row exists? This requires more resources, not less, than just trying to update the row not knowing whether it exists. So do this instead:

BEGIN TRANSACTION;
UPDATE ... WHERE key = @key;
IF @@ROWCOUNT = 0
BEGIN
  INSERT(key) VALUES(@key);
END
COMMIT TRANSACTION;

You can also isolate the other read-only processes by using a different isolation level that doesn't require them to wait for write operations to complete. NOLOCK / READ UNCOMMITTED is a popular one, but READ COMMITTED SNAPSHOT - assuming you can take the hit on tempdb - is way better IMHO. There are other alternatives, too. I happened to blog about this just yesterday:

Bad habits : Putting NOLOCK everywhere

Mysql – SELECT ROWS WITH SIMILAR ID AND MERGE THE FILED VALUES IN ONE

You can use group concat for this. example

mysql> select  * from dummy;
+----+-------+
| id | name  |
+----+-------+
|  1 | fiest |
|  1 | fiest |
|  1 | sec   |
+----+-------+
3 rows in set (0.00 sec)

mysql> select group_concat(id) from dummy group by id;
+------------------+
| group_concat(id) |
+------------------+
| 1,1,1            |
+------------------+
1 row in set (0.00 sec)