Your second option is far cleaner and will perform well enough to make that worth it. Your alternative is to build gigantic queries which will be quite a pain to plan and execute. In general you are going to be better off letting PostgreSQL do the work here. In general, I have found updates on tens of thousands of rows in the manner you are describing to perform adequately, but there is one important thing to avoid doing.
The way to do it is to use a select and a join in your delete.
DELETE FROM foo WHERE id IN (select id from rows_to_delete);
Under no circumstances should you as follows with a large table:
DELETE FROM foo WHERE id NOT IN (select id from rows_to_keep);
This will usually cause a nested loop antijoin which will make performance rather problematic. If you end up having to go that route, do this instead:
DELETE FROM foo
WHERE id IN (select id from foo f
LEFT JOIN rows_to_keep d on f.id = d.id
WHERE d.id IS NULL);
PostgreSQL is usually pretty good at avoiding bad plans but there are still cases involving outer joins which can make a big difference between good and bad plans.
This is wandering a little more far afield, but I figure it is worth mentioning because of how easy it is to go from the IN to NOT IN and watch query performance tank.
First of all, you would not need a subquery in your statement. The bare function does the job:
INSERT INTO testschema.requestinfo (requestid, reportid, amount)
VALUES (currval('testschema.requests_requestid_seq'), 10, 40)
, (currval('testschema.requests_requestid_seq'), 15, 29)
, (currval('testschema.requests_requestid_seq'), 40, 4);
Next, currval()
only makes sense if you have (implicitly or explicitly) called nextval()
for the same sequence before in your session. If you have not called nextval()
for any other sequences in the meantime, you can simplify with lastval()
.
INSERT INTO testschema.requestinfo (requestid, reportid, amount)
VALUES (lastval(), 10, 40)
, (lastval(), 15, 29)
, (lastval(), 40, 4);
Either way, if you have many rows like this, look to @dezso's answer for a way to avoid typing the function repeatedly.
currval()
and lastval()
are volatile functions:
SELECT proname, provolatile
FROM pg_proc
WHERE proname IN ('currval', 'lastval')
Result:
proname provolatile
currval v
lastval v
That means, they will still be evaluated for every row that way.
To also avoid repeated execution of the function, move it to a separate CTE or subquery and CROSS JOIN
to your input values:
INSERT INTO testschema.requestinfo (requestid, reportid, amount)
SELECT * -- order of rows is maintained from left to right
FROM lastval() -- or currval('testschema.requests_requestid_seq')
, VALUES (10, 40), (15, 29), (40, 4);
Best Answer
Using
strace
, I can see that the files are only actually read the first time an import is done in any given session (there may be exceptions, like if the effective user id changes within a session). There may be some initialization which is re-done every time the import is run, I don't know how to investigate that.I don't think you can boost the loading to above the session level.