I am trying to extract information from strings that are presented in a key-value format, with the keys and values being separated by commas. I want to extract the values associated with certain keys and add them to dedicated columns in my table.
Some notes on the data:
- The keys I am interested in are connected, as in keyB relates to keyA;
- in some cases keyB or keyA may not exist
- If keyB doesn't exist, but keyA is something specific, then I can set the value for keyB anyway.
I have a solution that does what I want ([db-fiddle]), but it is painfully slow (9.6 hours) and I can't help think that there must be a better way as I've not been in this DB game long.
For info,
The table has ~8.2M rows and is hosted on AWS RDS on a t3.large DB (virtual CPUs = 2, Memory = 8.0GB).
Some pointers on where I can improve this are much appreciated.
Best Answer
You only need a single INSERT and a single UPDATE for the main table. The whole default detection and populating of the "variables" (columns) can be done while inserting the data:
The final update can also handle the "unknown" value directly, no need to run three updates:
One way to speed that up, is to create an index on
main_tags (id)
You can actually get rid of the temp table completely and do everything in a single statement. This might or might not be faster, but I think it's worth trying: