I have some queries for you regarding table sizes that you can run in MySQL during these spikes
Database size in terms of StorageEngine (MB)
SELECT IFNULL(B.engine,'Total') "Storage Engine", CONCAT(LPAD(REPLACE(FORMAT(
B.DSize/POWER(1024,pw),3),',',''),17,' '),' ',SUBSTR(' KMGTP',pw+1,1),'B') "Data Size",
CONCAT(LPAD(REPLACE(FORMAT(B.ISize/POWER(1024,pw),3),',',''),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Index Size",CONCAT(LPAD(REPLACE(FORMAT(B.TSize/
POWER(1024,pw),3),',',''),17,' '),' ',SUBSTR(' KMGTP',pw+1,1),'B') "Table Size"
FROM (SELECT engine,SUM(data_length) DSize,
SUM(index_length) ISize,SUM(data_length+index_length) TSize FROM information_schema.tables
WHERE table_schema NOT IN ('mysql','information_schema','performance_schema') AND
engine IS NOT NULL GROUP BY engine WITH ROLLUP) B,(SELECT 2 pw) A ORDER BY TSize;
Database size in terms of Databases (MB)
SELECT DBName,CONCAT(LPAD(FORMAT(SDSize/POWER(1024,pw),3),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Data Size",
CONCAT(LPAD(FORMAT(SXSize/POWER(1024,pw),3),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Index Size",
CONCAT(LPAD(FORMAT(STSize/POWER(1024,pw),3),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Total Size" FROM (SELECT
IFNULL(DB,'All Databases') DBName,SUM(DSize) SDSize,SUM(XSize) SXSize,
SUM(TSize) STSize FROM (SELECT table_schema DB,data_length DSize,
index_length XSize,data_length+index_length TSize FROM information_schema.tables
WHERE table_schema NOT IN ('mysql','information_schema','performance_schema')) AAA
GROUP BY DB WITH ROLLUP) AA,(SELECT 2 pw) BB ORDER BY (SDSize+SXSize);
Database size in terms of Database/StorageEngine (MB)
SELECT IF(ISNULL(B.table_schema)+ISNULL(B.engine)=2,"Storage for All Databases",
IF(ISNULL(B.table_schema)+ISNULL(B.engine)=1,CONCAT("Storage for ",B.table_schema),
CONCAT(B.engine," Tables for ",B.table_schema))) Statistic,CONCAT(LPAD(REPLACE(FORMAT(
B.DSize/POWER(1024,pw),3),',',''),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Data Size",CONCAT(LPAD(REPLACE(FORMAT(
B.ISize/POWER(1024,pw),3),',',''),17,' '),' ',SUBSTR(' KMGTP',pw+1,1),'B') "Index Size",
CONCAT(LPAD(REPLACE(FORMAT(B.TSize/POWER(1024,pw),3),',',''),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Table Size" FROM (SELECT table_schema,engine,
SUM(data_length) DSize,SUM(index_length) ISize,SUM(data_length+index_length) TSize
FROM information_schema.tables WHERE table_schema NOT IN
('mysql','information_schema','performance_schema') AND engine IS NOT NULL
GROUP BY table_schema,engine WITH ROLLUP) B,(SELECT 2 pw) A ORDER BY TSize;
Pay attention to certain markers
- Innodb_buffer_pool_pages_dirty
- Innodb_data_reads
- Innodb_data_writes
I recommend downloading MySQL Administrator (I know, it's old but I still you it for quick and dirty "I WANNA SEE STATS NOW" moments of day) and set it up. I customized my own graphs to watch the size of the InnoDB Buffer Pool and its dirty pages. You could also just use the Connection Health tab.
Best Answer
Not having the full table definition, I'm going to take a stab at what it might be. Based on the screenshot in your question, the table could look something like this:
And just for funsies, lets throw some random sample data in there:
And now we can run our query against that data:
And we can see that this query does a Clustered Index Scan:
I don't know for sure that your query is doing a Clustered Index scan, but that is going to be my guess. We're only looking for one row--so ideally, we would be able to do a seek to that one row (or, at least fewer rows). The
OR
ed criteria makes that a little bit trickier, so having a single index that always gets us the exact row is a bit challenging. Instead, we can just try to get close.We can create an index on
TaskSchedulerID
(we always filter on that!), andDateLastUpdated
(we always sort on that!), andItemStatus
(we filter on that, but in the OR, so it's a little trickier for SQL to search on that):Huzzah! It's doing a seek!
That should make the query pretty darned fast... but what's that little warning symbol?
That's the "get close" part I was talking about. In my random sampling, the index seek was able to seek to 280 rows, then did some residual I/O to find the one row it needed that fit the critera. Based on the index I created, it is going to seek to the supplied
@TaskSchedulerID
, then start running along theDateLastUpdated
column (remember, that's theORDER BY
on the query, then do "residual I/O" to figure out thatItemStatus
criteria.Depending on your data, that might be "good enough" to fix your performance issues. Or you might need to be a bit more clever. Since I don't know your actual schema, indexing, or data distribution, I can't hazard a better guess.
But really, you just need an index that will make your query go a lot faster, so that you use fewer DTUs.