PostgreSQL – Logging Collector Blocking Auto Explain Output

explainlogspostgresql

Postgres 10.4

Postgres 9.6

Problem on both versions

I want to collect postgresql slow queries plan, so I added to my postgresql.conf setting from documentation

session_preload_libraries = 'auto_explain'

auto_explain.log_min_duration = '5s'

in the end of config

But also we use

logging_collector = true

and if logging_collector is on, no explain in logs

If logging_collector commented, I can see plan of queries
How can I use both options?

UPDATE:
We had about 10 queries runs more than 5 seconds, but finally with grep we found only one output in log with explain instead of all plans
How it could be only one record was logged? neither all plans, no no plans only one explain

Best Answer

Works for me.

Could it be that the other long queries were run over existing connections (perhaps with a connection pooler) for which the session_preload_libraries has not taken effect?

Related Solutions

Help to understand explain plan in Oracle

Explain plan does not tell you what is actually the most costly "operation". The "Cost" column is a guess - it is a value estimated by optimizer. So is "Cardinality" column and "Bytes" column. http://docs.oracle.com/cd/B28359_01/server.111/b28274/ex_plan.htm#i18300

In your example, your optimizer tells you: I decide to use this plan because I guess that looping would cost about 5,483. And I hope this would be the most costly part of the execution, but I can't guarantee this.

The same applies recursively to all the depths of the tree.

If you go in-depth to the lowest levels (that is by intuition most-looped, most-executed levels) you see that the operation that especially sticks out, both in terms of expected cost and expected number of elements, is the

6 INDEX RANGE SCAN INDEX RAIDPIDAT.IDX_HISTORY_STATE_TABLE_1TPALM Cost: 662  Cardinality: 102,068

So, optimizer guessed that optimal execution of this query is to loop a lot around a poor workhorse RAIDPIDAT.IDX_HISTORY_STATE_TABLE_1TPALM. I really cannot see which part of your query directly relates to it, but I suspect t1.data_tratado condition. And, again, I cannot see if it is really the most costly part.

I'll try to translate the syntax of loops in the explain plan to procedural pseudo-code:

/* begin step 13 (by "step 13" I mean a line that reads "   13 NESTED LOOPS") */
  /* begin step 7 */
    do step 5
    myresult = rows from step 5
    for each row from myresult {
       do step 6
       for each row from step 6 {
           join to a row from myresult the matching row from step 6
       }
    }
  /* end step 7 */
  for each row from myresult {
     do step 12
     for each row from step 12 {
         join to a row from myresult the matching row from step 12
     }
  }
/* end step 13 */
return myresult

Seems complicated, but really aim of each "nested loop" is to create a join (a single table made of two tables) in the most naive way, a loop-inside-a-loop.

PostgreSQL – Inconsistent Statistics on JSONB Column with Btree Index

Currently (version 9.6), Postgres does not have any statistics about the internals of document types like json, jsonb, xml or hstore. (There has been discussion whether and how to change that.) Instead, the Postgres query planner uses constant default frequency estimates (like you observed).

However, there are separate statistics for functional indexes like your idx_test_btree. The manual has this tip for you:

Tip: Although per-column tweaking of ANALYZE frequency might not be very productive, you might find it worthwhile to do per-column adjustment of the level of detail of the statistics collected by ANALYZE. Columns that are heavily used in WHERE clauses and have highly irregular data distributions might require a finer-grain data histogram than other columns. See ALTER TABLE SET STATISTICS, or change the database-wide default using the default_statistics_target configuration parameter.

Also, by default there is limited information available about the selectivity of functions. However, if you create an expression index that uses a function call, useful statistics will be gathered about the function, which can greatly improve query plans that use the expression index.

The volume of statistics gathered depends on general setting of default_statistics_target, which can be overruled with a per-column setting. The setting for the column automatically covers depending indexes.

The default setting of 100 is conservative. For your test with 1M rows, if data distribution is uneven, it may help to increase it substantially. Checking on this once more I found you can actually tweak the statistics target per index column with ALTER INDEX, which is currently not documented. See related discussion on pgsql-docs.

ALTER TABLE idx_test_btree ALTER int4 SET STATISTICS 2000;  -- max 10000, default 100

Default names for index columns are not exactly intuitive, but you can look it up with:

SELECT attname FROM pg_attribute WHERE attrelid = 'idx_test_btree'::regclass

Should result in the type name int4 as index column name for your case.

The best setting for STATISTICS depends on several factors: data distribution, data type, update frequency, characteristics of typical queries, ...

Internally, this sets the value of pg_attribute.attstattarget, and the exact meaning of this is (per documentation):

For scalar data types, attstattarget is both the target number of "most common values" to collect, and the target number of histogram bins to create.

Then run ANALYZE if you don't want to wait for autovacuum to kick in:

ANALYZE test_data;

You must ANALYZE the table, since you cannot ANALYZE indexes directly. Check with (before and after if you want to verify the effect):

SELECT * FROM pg_statistic WHERE starelid = 'idx_test_btree'::regclass;

Try your query again ...

Best Answer

Related Solutions

Help to understand explain plan in Oracle

PostgreSQL – Inconsistent Statistics on JSONB Column with Btree Index

Related Question