First, you have conditions duplicated in the ON
and the WHERE
clause. That is not needed, the condition are only needed once, preferably in the ON
clause as they are used for the join of the two tables. But that is not what's causing the slow execution (and the lost connections).
The reasons for being slow are three:
you are joining a 63K with a 363K table without any index to help. That is the major issue.
the columns used in the join do not have same type, length, character set and collation. This is another major issue, worse than the first. Even if you add indexes, but keep the same types, the indexes will not be of much help. More seriously, it's a question if you can / are allowed to change the types or the applications depend on the existing types.
you are updating potentially the whole table. This is rather minor as the table is small, only 63K rows. We'll see if that is still a problem, after we fix the major issues.
The first thing that needs addressing is the types (+ lengths + charsets + collations) of the two columns. Assuming that you are allowed to change the types of the table order
, you should change the types (both varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL
) to match the types of the geo
table (one varchar(50)
, the other char(2)
, both CHARSET=latin1
, unknown/default collation.)
This will be tricky and perhaps error prone, so I suggest you first take a backup of the whole database, or just the table orders
, before you attempt to change anything.
If I were you, I'd then first (before changing anything), check the data in these two columns and whether their maximum length is less than 50 and 2 respectively, to be sure that the truncation of the length will not change any valuable data.
I'd also check the collation of the cities_extended.city
and cities_extended.state_code
columns, it's probably the default latin1_swedish_ci
.
(Of course these two steps above can be skipped - but only if you don't really care about the orders
table and you can recreate it from scratch if needed.)
After everything is double checked, we can proceed with the actual changing of the types:
ALTER TABLE ProcurementPortal.orders
MODIFY COLUMN oCity
varchar(50)
CHARSET 'latin1'
-- COLLATE 'latin1_swedish_ci'
DEFAULT NULL,
MODIFY COLUMN oState
char(2)
CHARSET 'latin1'
-- COLLATE 'latin1_swedish_ci'
DEFAULT NULL ;
(the COLLATE 'latin1_swedish_ci'
are probably not needed, the defaults will be used, so they are commented out.)
If that goes well and there are no errors, we can proceed with the final step, which is to add the indexes:
Add two indexes, one on each table:
ALTER TABLE ProcurementPortal.orders
ADD INDEX state_city_ix
(oState, oCity) ;
ALTER TABLE ProcurementPortal.cities_extended
ADD INDEX state_city_ix
(state_code, city) ;
After that (it will take a while but you only need to do this once), you can proceed with the update statement.
If it's still taking too long, I'd consider making the update in batches (of say 1K-5K rows per time). But with so small tables, I don't think you need that.
Best Answer
Assuming that your situation looks similar to this:
The tables now contain the following rows:
The update could look something like ...
After executing the UPDATE, the PRODUCTS table contains:
See dbfiddle here.
ADDITIONAL questions and answers:
1) The code that you wrote have OD.Quantity instead of Orders.Quantity. How does the program know that OD.Quantity is Orders.Quantity ?
In my example, the "quantity" of an ordered product is stored in the OrderDetails table. (The ORDERS table does not contain a quantity).
2) You use O instead of Orders. How does the program know that O is Orders ?
Table aliases. When you write something like SELECT ... FROM mytable M ... , then the letter M can be used as a "short name" / alias for the table. Usually, this makes queries easier to code (and read).
3) When I replace your code using Orders, and Order_Details instead of O and OD I got an error. Why is that ?
I don't know - as you did not put any examples on dbfiddle (or into the question itself). However, if you replace all Os and ODs with the full table names, it should work. See dbfiddle here. Notice that there is a WHERE clause for the UPDATE, too (this is needed, otherwise all rows in the PRODUCTS table would be updated).