I manage an application that has what I would call a medium sized workload (20,000-30,000 records inserted per day with around 2GB of data total). We're on RDS Postgres 9.6 with a db.m4.xlarge (4 vCPU and 16 GB of RAM) instance (I'm way over provisioned, I know). Everything has been running smoothly for months with the average CPU utilization under 10%.
Yesterday I got an escalation from the application support that our CPU has spiked above 80% and since then I haven't gotten it to go back down under 50%.
This is the graph of the last 4 days of CPU Utilization
There's been no spike in usage over this same time.
I've looked into long running queries through out the day using
SELECT max(now() - xact_start) FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'active');
The largest amount I've seen is 1.5 seconds
We have about 40 connections but most are inactive.
This same thing happened a few months ago and we moved to our now way overprovisioned server to hopefully not have it happen again. So now I'm trying to figure out the root cause.
Any help would be appreciated.
Best Answer
Hi It's not about just not the number of number of session or queries which are running.
Below will help you to understand how to troubleshoot this problem
Download the logs from RDS console for this instance to generate the Pgbader report. Below is the format which is most suitable to generate the logs
Download the pgbadger and use below command to generate the report. https://github.com/darold/pgbadger
Match your CLoudwatch or RDS CPU time stamp with Pgbagder report to see how many queries are getting executed
Try to find the query which may be eating a lot of CPU. pgbadger -q /postgresql*.log -f stderr -b "2019-07-17 08:55:19" -e "2019-07-22 07:30:07" -o pgbadger.html
I may be able to give specific guidance if you can hare more data.
Cheers, Aj http://www.postgresql-blog.com