Postgresql – n efficient way to use 3rd party packages in PL/Python

plpythonpostgresql

I've found examples of people importing 3rd party packages into PL/Python scripts, but it seems to me that re-importing libraries upon every run of a stored procedure is terribly inefficient.

Does Postgres maintain a process-wide python interpreter that's re-used for subsequent PL/Python script executions?

If so, is there any way to have it execute import statements in that context BEFORE a PL/Python procedure runs?

Alternatively, would it be possible/practical to provide a C-compatible dynamically-linked library that hosts a persistent Python interpreter?

Best Answer

I've found examples of people importing 3rd part packages into PL/Python scripts, but it seems to me that re-importing libraries upon every run of a stored procedure is terribly inefficient.

Using strace, I can see that the files are only actually read the first time an import is done in any given session (there may be exceptions, like if the effective user id changes within a session). There may be some initialization which is re-done every time the import is run, I don't know how to investigate that.

I don't think you can boost the loading to above the session level.

Related Solutions

Postgresql – Most efficient way of bulk deleting rows from postgres

Your second option is far cleaner and will perform well enough to make that worth it. Your alternative is to build gigantic queries which will be quite a pain to plan and execute. In general you are going to be better off letting PostgreSQL do the work here. In general, I have found updates on tens of thousands of rows in the manner you are describing to perform adequately, but there is one important thing to avoid doing.

The way to do it is to use a select and a join in your delete.

DELETE FROM foo WHERE id IN (select id from rows_to_delete);

Under no circumstances should you as follows with a large table:

DELETE FROM foo WHERE id NOT IN (select id from rows_to_keep);

This will usually cause a nested loop antijoin which will make performance rather problematic. If you end up having to go that route, do this instead:

DELETE FROM foo 
WHERE id IN (select id from foo f 
          LEFT JOIN rows_to_keep d on f.id = d.id
              WHERE d.id IS NULL);

PostgreSQL is usually pretty good at avoiding bad plans but there are still cases involving outer joins which can make a big difference between good and bad plans.

This is wandering a little more far afield, but I figure it is worth mentioning because of how easy it is to go from the IN to NOT IN and watch query performance tank.

PostgreSQL – How to Use currval() Without Selecting It Multiple Times

First of all, you would not need a subquery in your statement. The bare function does the job:

INSERT INTO testschema.requestinfo (requestid, reportid, amount)
VALUES (currval('testschema.requests_requestid_seq'), 10, 40)
     , (currval('testschema.requests_requestid_seq'), 15, 29)
     , (currval('testschema.requests_requestid_seq'), 40, 4);

Next, currval() only makes sense if you have (implicitly or explicitly) called nextval() for the same sequence before in your session. If you have not called nextval() for any other sequences in the meantime, you can simplify with lastval().

INSERT INTO testschema.requestinfo (requestid, reportid, amount)
VALUES (lastval(), 10, 40)
     , (lastval(), 15, 29)
     , (lastval(), 40, 4);

Either way, if you have many rows like this, look to @dezso's answer for a way to avoid typing the function repeatedly.

currval() and lastval() are volatile functions:

SELECT proname, provolatile
FROM   pg_proc
WHERE  proname IN ('currval', 'lastval')

Result:

proname provolatile
currval v
lastval v

That means, they will still be evaluated for every row that way.
To also avoid repeated execution of the function, move it to a separate CTE or subquery and CROSS JOIN to your input values:

INSERT INTO testschema.requestinfo (requestid, reportid, amount)
SELECT *           -- order of rows is maintained from left to right
FROM   lastval()   -- or currval('testschema.requests_requestid_seq')
     , VALUES (10, 40), (15, 29), (40, 4);

Best Answer

Related Solutions

Postgresql – Most efficient way of bulk deleting rows from postgres

PostgreSQL – How to Use currval() Without Selecting It Multiple Times

Related Question