MariaDB instance in cloud hit 95% CPU utilisation for 30 mins

mariadbmonitoringperformance

I've got MariaDB instance in the cloud which I put live about 2 months ago. I'm new to the DBA business and I got the impression the people at the company who I run it through are learning on the hoof as well.

All is going well except that I saw CPU utilisation spiked on Tuesday to 95% for now apparent reason.

I requested performance stats after a user complained of a long running query. It turns out they were doing regular hourly queries on unindexed fields and the table scans involved suddenly took a lot longer. I am now looking at indexing this column so they can run their queries faster.

I'd like to know whether their queries could be the cause of the CPU utilisation, or whether whatever caused the CPU utilisation had just grabbed all the cache and caused their table scanning query to suddenly take far longer.

What should I be doing to capture information that will show what is going on and why the CPU utilisation spiked, and if it's a concern?

Here's the 5 days chart of CPU utilisation: enter image description here

and the 1 day chart
enter image description here

and the only other chart that correlated with it:

enter image description here

and

enter image description here

The scales on the chart are probably the key, right?

The other charts provided (database connections, write IOPS, disk queue depth) didn't show any correlation.

I'm also puzzled why the normal pattern has totally changed. The usage is around a million queries per day, and has been for 2 months now.

Best Answer

(Not yet an "Answer", but debugging help:)

High CPU almost always means a SELECT needs optimizing. To find out which query:

  • Run SHOW FULL PROCESSLIST; when the naughty query is running (10:00?). Run it a few times; hopefully, you can spot some query with a large "Time" or perhaps it is being run repeatedly.
  • Or us the SlowLog to find it. Here are tips on how to get the info and present it to us here for help.

You show heavy I/O also. That could be a query that is doing a table scan. Again, find the query and ask for help.

Meanwhile, refresh graphs, and add one for "slow queries", if it exists.