Postgresql – Query in PostgreSQL became slow and then got better on its own – why

performancepostgresqlpostgresql-10postgresql-performance

I want to better understand how to debug and fix performance issues in PostgreSQL.

I have a query that usually costs 1 second to run. However, for some reason, it started taking 30 seconds to run for a period of time! After some minutes and some explain analyze invocations, it went back to normal. I didn't do anything to fix it, nor did I understand why it got better all of a sudden.

How do I debug a problem like this? What kind of logging and instrumentation should I have in place to understand what caused this and how to fix it?

Best Answer

It is a very broad question, even more so as you give absolutely no details about your setup and kind of database (volume, type of queries, active connections, size of RAM, dedicated server or not, etc.)

You can start by enabling PostgreSQL to log slow queries, see the log_min_duration_statement in the configuration. That will give you historical data that you would then be able to analyze and maybe correlate with other things (like from the list of @Vérace)

You will then have various tools to help, as described on https://wiki.postgresql.org/wiki/Logging_Difficult_Queries :

pgFouine
PQA
EPQA
pgsi

Related Solutions

PostgreSQL – Slow Query on Primary Server Runs Fast on Hot Standby

The bad plan is probably a culmination of many problems. Which means that there are many ways of tackling it. My guess is that the culmination of problems causes two plans to look falsely close to each other in cost, and then the differences in memory setting (probably effective_cache_size) between master and slave is the straw that broke the camel's back and tips the master over to using the wrong one.

The simplest starting point for tackling it may be this one:

"        ->  Index Scan using idx_own_spas on spatial_artifact this_  (cost=0.43..23351.70 rows=6841 width=604) (actual time=0.037..2.119 rows=618 loops=1)"
"              Index Cond: (propietari = 7649)"

This is simple because the scan runs to completion, and only has a single index qualification, and yet the estimate is off by a factor of 10. Since your statistic are up to date, the problem must be an inadequate default_statistics_target. You can increase that globally, or increase it just for the column propietari using SET STATISTICS, and then re-analyzing the table. If this estimation problem were fixed, the master would almost certainly switch to that faster plan as it would appear even faster yet.

Another problem is that it thinks rows meeting the criteria ((group_id IS NULL) AND (propietari = 7649)) are distributed randomly over the values of this_.id, which they apparently are not. That causes it to think the LIMIT clause will kick in much sooner than it actually does, so making the slow plan look like it will be faster than it will be. Unfortunately, there is nothing simple you can do to make this estimation better. But, if you create the right index, you could provide a third plan which actually is much faster, and also looks much faster, than either of the two current ones.

That index would be:

CREATE INDEX idx_own_spas
  ON spatial_artifact
  USING btree
  (propietari,id)
  WHERE group_id IS NULL;

You can replace the existing similar index but without the id column with this index, so it should not take up much additional space or maintenance overhead.

Finally, you can change effective_cache_size within each session, so it would be easy to change this value on the master and on the slave to see if that is truly which is tipping the plan over from the good plan to the bad plan. But if that is the case, that still doesn't tell you what to do about it. Just because a certain value causes a bad planner choice for one query doesn't mean it is not set properly for the system as a whole. So hopefully one of the early two methods will work.

Postgresql – Is the transfer time from PostgreSQL to the application optimisable

That time is spent by pgAdmin to pack and render data and is not the time spent by Postgres to complete the query execution. Why are you fetching 250.000 rows into pgAdmin? If you need to export the table to a plain-text file (like a CSV with header) you can execute this query:

COPY (
  SELECT 
    mobile_state_message.latitude, 
    mobile_state_message.longitude
  FROM 
    public.mobile_state_message
  WHERE 
    topic_name = 'MyCar'
)
TO '/path/to/file.csv' WITH (FORMAT CSV, HEADER);

Remember that /path/to/ must be owned by postgres user, so run chown -R postgres /path/to/.

This way you can use wathever to analyze the table (if you need to).

Best Answer

Related Solutions

PostgreSQL – Slow Query on Primary Server Runs Fast on Hot Standby

Postgresql – Is the transfer time from PostgreSQL to the application optimisable

Related Question