Sorry about the title, I wasn't sure how best to word this. My situation is this:
I have two tables:
paper_author [paper_id, author_id, author_name]
authors [author_id, author_name]
Right now, the author_name
column in paper_author
is NULL
. I would like to select the author name from the authors
table and insert it into the author_name
column in the paper_author
table. I've written the following query:
UPDATE paper_author pa SET author_name = (SELECT author_name FROM authors a WHERE a.author_id = pa.author_id);
Which I think will do what I want, but this is taking a very long time to run (days). For reference, the table paper_author
has ~900M rows and the table authors
has ~200M rows. The author_id
is the primary key in the authors
table. author_id
has an index in the table paper_author
. The rows in paper_author
are not unique on author_id
(i.e. each author_id
may appear multiple times associated with different papers).
Is there a more efficient way to write this? Have I missed something?
Best Answer
An update of that magnitude is inevitably going to take a substantial amount of time, no matter how good your configuration is or how powerful the hardware. It's possible there might be room for improvement, but there are not enough details to go on here. For a start, it would be interesting to know the output of:
You might want to consider an alternative strategy along the lines of breaking up the update into smaller batches. That will give you a better idea of how long the process might last, and provide better control over your system resources.