Mysql – which one is faster: “insert on duplicate update” OR “update with case”

mysql-8.0update

Table:

CREATE TABLE t5 (
 col1 VARCHAR(36) NOT NULL PRIMARY KEY, 
 col2 int(11) DEFAULT NULL, 
 col3 int(11) DEFAULT NULL, 
 UNIQUE (col2)
);

Records already inside:

insert into t5(col1, col2, col3) values 
('872c18ba-35ae-11e9-91a5-88d7f65253d0', 1, 1), 
('823c3d3a-35ae-11e9-91a5-88d7f65253d0', 2, 2);

Which of these two queries are expected to perform better:

Query-1:

insert into t5(col1, col2, col3) values 
 ('872c18ba-35ae-11e9-91a5-88d7f65253d0', 2, 888), 
 ('823c3d3a-35ae-11e9-91a5-88d7f65253d0', NULL, 999), 
on duplicate key update 
 col2=if(values(col2) IS NULL, col2, values(col2)), 
 col3=if(values(col3) IS NULL, col3, values(col3));

Query-2:

update t5 set 
 col2=(case col1 when '872c18ba-35ae-11e9-91a5-88d7f65253d0' then 2 END), 
 col3=(case col1 when '872c18ba-35ae-11e9-91a5-88d7f65253d0' then 888 when '823c3d3a-35ae-11e9-91a5-88d7f65253d0' then 999 END) 
where col1 in ('872c18ba-35ae-11e9-91a5-88d7f65253d0', '823c3d3a-35ae-11e9-91a5-88d7f65253d0');

Please take note in query-2, the statement is tailored from client side by pruning the NULL value first.

Condition of the table and the frequency the above query is performed as below:

number of records is in hundreds of thousand
query is performed once every x-seconds, each with ~150 keys

The objective is to reduce the I/O by not hammering the db with one single query on loop, as well as preventing a deadlock due to query taken too long time (yes, there was some deadlock earlier)

Best Answer

col2=if(values(col2) IS NULL, col2, values(col2)) is equal to col2=COALESCE(values(col2), col2).
If ODKU causes UNIQUE (col2) violation, the whole query will fail (see fiddle, first query).
The queries are not equivalent (second query will set col2=NULL for 1st record, see fiddle).

fiddle

Finally:

which faster: “insert on duplicate update” OR “update with case”

If no errors in queries and data INSERT .. ODKU is faster.

Related Solutions

Sql-server – Way to update many (>3000) records with unique data, using a character limited web form

The SET can be made a bit simpler:

SET info4 = 'BNID: ' +
  CASE guestid
    WHEN 12345 THEN '917546553'
    WHEN 67890 THEN 'B04695465'
      [etc, etc.]
                          --- no ELSE here
  END 

WHERE info4 IS NULL       --- and this is needed

Saves 6 characters per case.

If you can create a table (even a temporary one) you could also use:

--- run once
CREATE TABLE #tmp
( guestid INT NOT NULL
, i4 VARCHAR(200)
, PRIMARY KEY (guestid)
) ;

--- as many times as needed
INSERT INTO #tmp VALUES
(12345,'917546553'),
...
(67890,'B04695465');

and then:

UPDATE guest
  SET info4 = 'BNID: ' + i4
FROM guest
  JOIN #tmp
    ON #tmp.guestid = guest.guestid
WHERE guest.info4 IS NULL

MySQL Performance – ON DUPLICATE KEY UPDATE vs. UPDATE

I know it is not easy getting an execution plan for an update from MySQL since it only provides those on SELECT statements. But the clue might be in the order in which the records are updated, the evaluation of a WHERE which contains a IN with a large amount of static data, as well as the amount of connected reads and writes, intermediate caching, associated with that.

The statement

UPDATE humans_we_respect SET expressed_interest=1 WHERE id IN (1,...,10000);

is a type of statement we try to avoid when updating larger databases since the parser seems to go wild on them from time to time. IN ( a,b,c,...,ZZZZ ) for me has become a coding style only suitable for very small item numbers in the IN data. I am working on a open source project where I frequently run into what I call "remote minded join", the second half usually looks exactly like your problem.

SELECT id FROM all_our_customers WHERE happytospam=1 AND LENGTH(email) > 6;
...
Storing result on client side as string like 
LOOP over results
$all_ids += ",$next_result";
END_LOOP
$all_ids = SUBSTRING($all_ids,1); 
ending up with a string like 
"1,2,3,4,5,8,10,100,1000,...,100000" in $all_ids
...
UPDATE humans_we_respect SET expressed_interest=1 WHERE id IN ( $all_ids )

While the first part usually executes lightning fast, the second part takes forever, which is what you describe as well. These queries can usually be sped up EXTREMELY by rewriting them as:

UPDATE humans_we_respect,all_our_customers 
SET humans_we_respect.expressed_interest=1 
WHERE all_our_customers.id = humans_we_respect.id 
AND all_our_customers.happytospam=1 
AND LENGTH(all_our_customers.email) > 6

We also used

UPDATE humans_we_respect 
SET expressed_interest=1 
WHERE id IN ( 
SELECT id 
FROM all_our_customers 
WHERE happytospam=1 
AND LENGTH(email) > 6 
)

which performed better than the original but not as good as my suggested version.

This all is assuming that you use proper indexes with primary on ID and combined multi column indexes where multiple columns are frequently used together or have a good significance together and are usually present in your queries.

The Clue is that large amounts of static values in IN clauses increase the execution time nearly exponentially on queries with many matching records, since they basically do NOT USE ANY INDEXES OR OPTIMIZATIONS and usually end up in full table scans, in which IMHO the execution will examine every record/line in your table by comparing it with every item in your IN() list ONE by ONE.

The statement like

INSERT INTO humans_we_respect (id) VALUES (1),...,(10000) ON DUPLICATE KEY UPDATE expressed_interest=1;

however is using the index to locate the record and then updates it, even so not intended for this use, it will run much better due to the use of a index on ID if there is one and will only do a index-lookup for the record rather than thousands of comparisons! Working directly with two tables might however be faster if the list of IDs is derived from another table on the same server, there are more optimizations that can be used plus you do not have to transfer data out-of and into the mysql server process.

Just as some extra info:

UPDATE humans_we_respect SET expressed_interest=1 WHERE id='1' OR id='2' OR ...

is a good technique to optimize IN() queries with very low counts of elements since it will create a parallel index query for every element, which is great for the first elements but will decrease in performance with more elements a lot, and at some point hit the limit of the parser for optimization (IMHO it might be 255 elements in one query) at which point it will grind to a snails pace again...

Best Answer

Related Solutions

Sql-server – Way to update many (>3000) records with unique data, using a character limited web form

MySQL Performance – ON DUPLICATE KEY UPDATE vs. UPDATE

Related Question