Postgresql – What exactly does PostgreSQL (or other databases) do internally when you “prepare” a query rather than just running it directly

optimizationpostgresqlprepared-statement

When I face a huge number of repeated INSERTs in a loop, I tend to first create a "prepare skeleton query" prior to the loop, and in the loop, I simply "execute" this prepared query and send it all the values. I heard long ago, and can understand in some abstract sense, that this is more optimized than just having the loop with a normal parameterized query in it.

However, I don't understand what exactly PG is doing that makes it so much faster. If it even is so much faster. I frankly have never done any real benchmarks to compare the two different methods.

Also, shouldn't PG be "warmed up" to a query once you repeat it several (let alone many) times in very short succession, and perhaps do the same thing as I do manually, but internally, when I just use a normal parameterized query?

I find myself constantly second-guessing what the database is up to internally. I have no real grasp of how "smart" it is. I fear that I'm doing a lot of things which are meaningless because it's already taken care of internally by its smart code.

Maybe "preparing" and "executing" is an archaic practice which has no real benefit these days?

It sounds like PG is basically allocating resources to "prepare" for upcoming huge amounts of similar INSERT queries, but I don't understand what exactly it would do differently compared to just executing them one by one. I also don't understand what the point would be of a non-INSERT prepared query/statement.

PS: Just to clear up any confusion: I never use non-parameterized queries, whether they are prepared or not. A lot of people confuse "parameterized queries" and "prepared statements". Even I called the latter "prepared queries" in this question…

Best Answer

Actually, the difference between a parameterized statement and a prepared statement is not that big in PostgreSQL: the former is internally an unnamed prepared statement, and the main difference is that it is automatically deallocated when the next unnamed prepared statement comes along.

When you prepare a statement, PostgreSQL remembers that statement in your current session (nothing is shared between sessions).

For the first five executions, PostgreSQL will generate a custom query plan that takes the parameter values into account. After that, PostgreSQL will generate a generic plan that is independent of the query parameters, and if that generic plan is estimated to be no more expensive than the previous custom plans, it will be used from that point on.

For insert statement like you describe, PostgreSQL will always switch to the generic plan. The main advantage there is that the query doesn't have to be planned any more. With simple statements, planning time can exceed execution time, so the savings can be considerable.

PostgreSQL only caches query plans for prepared statements and statements in PL/pgSQL functions.

The other advantage of prepared (or parameterized) statements is that they avoid the danger of SQL injection.

If you have to insert a lot of rows, using COPY would be even faster.