Sql-server – SQL Server Merge vs MySQL Insert On Duplicate Key performance comparison

mergeMySQLperformancesql server

I've been searching a lot for this kind of comparison or some sort of performance balance.

What I do know so far is this:

MERGE is T-SQL command that is already implemented on SQL Server. As far as I know, it does perform very well, since it uses "kind of like" INNER JOIN hash mapping for inserting and updating, but I've been having some issues when deleting on the same MERGE statement as well when using CLUSTERED INDEX. Besides that, it's very fast and make let me do some comparison clauses before updating or inserting something, so it's very flexible for me.
In SQL Server some specific scenarios would rather use UPDATE and INSERT as separate statements and I wouldn't argue that. In my short experience, I would use MERGE by default as a standard for my code.
Now as a DBA, I'm facing a new challenge, I have to manage MySQL servers as well, so I ended up looking for similar MERGE behavior in MySQL to improve performance of the queries. So far, I found nothing alike but INSERT … ON DUPLICATE KEY … UPDATE. Still, I have some performance questions, since I'm not sure how it behaves with the server, how does it works and if optional pair of statements would work better or faster than that.
Looking around in MySQL, I found UPDATE + INSERT IGNORE, UPDATE + INSERT, INSERT … REPLACE, and so on..
MySQL documentation is a bit confusing when trying to determinate if I could use some other clauses ex. in MERGE statement I could use AND (TARGET.COLUMN_X > 'VALUE'):
```
MERGE _TABLE A_ AS TARGET
WITH _TABLE B_ AS SOURCE
ON (TARGET.KEY = SOURCE.KEY)
WHEN MATCHED AND (TARGET.COLUMN_X > 'VALUE')
  UPDATE
    TARGET.COLUMN_A = SOURCE.COLUMN_A
...
```
I don't find how to do this on MySQL.

*CONSIDERATIONS

I have to manage this to achieve better time result and performance friendly.

What I have as setting:

PERCONA MySQL
- txt file that need to be uploaded to a table every moth with new data and some changes (this is why I'm looking for a MERGE like statement)
- InnoDB MySQL Engine
- Relational Database tables so I can't delete or truncate the target table because all of them are related.

Best Answer

IODKU is the best of the options in MySQL. It works something like this:

Use some UNIQUE key (possibly the clustered, unique, PRIMARY KEY) to locate the row to modify.
If no such row, perform an INSERT.
If such a row, perform an UPDATE.

You can't get faster than that.

Note that step 1 will cache everything that is needed for #2 or #3 in the buffer_pool. (Well, OK, secondary indexes are handled in a 'delayed' way via the "Change Buffer".)

Further note that the statement is atomic, whereas your 2-statement alternatives need to be in a transaction.

Keep in mind that REPLACE is DELETE (0, 1, or possibly multiple rows, if you have multiple unique keys), then INSERT. Note that the AUTO_INCREMENT (if used) value is thrown away and a new one is created.

As for that messy query, it can probably be done with GREATER() and/or IF() and/or VALUES(). Assuming that you are trying to merge multiple rows in, you need the IODKU+SELECT syntax:

INSERT INTO Target (key, a)
    ON DUPLICATE KEY UPDATE
        SET a = VALUES(a) 
    SELECT Target.key, Target.a
        FROM Target
        JOIN Source ON Source.key = Target.key
        WHERE Target.x > 'value';

Since I don't know what MERGE does, I can't give you all the details.

Related Solutions

Sql-server – SQL Server 2008 R2 MERGE statement to replace single INSERT AND UPDATE statement combined

I haven't done any comparative testing of the two (yet) nor seen any articles on the topic. There is an Optimizing MERGE Statement Performance article on Technet but this doesn't include any comparisons with the update/insert syntax.

I can however suggest an improvement over your original syntax which eliminates the IF EXISTS lookup:

UPDATE 
    dbo.tblCustomer
SET 
    CustomerName = @CustomerName
WHERE
    CustomerID = @CustomerID;

IF (@@ROWCOUNT = 1)
BEGIN
    SELECT @New_ID = @CustomerID;
END
ELSE
BEGIN
    INSERT 
        dbo.tblCustomer
        (Taalnaam)
    VALUES
        (@CustomerName);

    SELECT @New_ID = SCOPE_IDENTITY();
END

You may also be interested in Mythbusting: Concurrent Update/Insert Solutions, which includes some examples of MERGE usage.

Sql-server – How to delete only related records in a multi-key MERGE in SQL Server

This is the separate DELETE operation I had in mind:

DELETE m
FROM dbo.Mapping AS m
WHERE EXISTS 
  (SELECT 1 FROM @Values WHERE LeftID = m.LeftID)
AND NOT EXISTS 
  (SELECT 1 FROM @Values WHERE LeftID = m.LeftID AND RightID = m.RightID);

As I outline here, for a left anti-semi join, the NOT EXISTS pattern will often outperform the LEFT JOIN / NULL pattern (but you should always test).

Not sure if your overall goal is clarity or performance, so only you can judge if this will work out better for your requirements than the NOT MATCHED BY source option. You'll have to look at the plans qualitatively, and the plans and/or runtime metrics quantitatively, to know for sure.

If you expect your MERGE command to protect you from race conditions that would happen with multiple independent statements, you better make sure that is true by changing it to:

MERGE dbo.Mapping WITH (HOLDLOCK) AS target

(From Dan Guzman's blog post.)

Personally, I would do all of this without MERGE, because there are unresolved bugs, among other reasons. And Paul White seems to recommend separate DML statements as well.

And here's why I added a schema prefix: you should always reference objects by schema, when creating, affecting, etc.

Best Answer

Related Solutions

Sql-server – SQL Server 2008 R2 MERGE statement to replace single INSERT AND UPDATE statement combined

Sql-server – How to delete only related records in a multi-key MERGE in SQL Server

Related Question