I have a table with over 60 million rows. Snapshots taken at multiple times a day for all databases across all instances. I am trying to reduce the rows in my select by only returning 1 snapshot per database/instance per day.
My current query using (SQL Server 2012):
SELECT instance_name ,
DU.database_name ,
DU.collection_time ,
DU.dbsize ,
DU.logsize ,
DU.reservedpages ,
DU.usedpages
FROM [core].[snapshots_internal] AS [SI]
LEFT JOIN [core].[source_info_internal] AS [SII] ON [SI].[source_id] = [SII].[source_id]
INNER JOIN [core].[snapshot_timetable_internal] AS [STI] ON [SI].[snapshot_time_id] = [STI].[snapshot_time_id]
LEFT JOIN [snapshots].[disk_usage] AS [DU] ON [SI].[snapshot_id] = [DU].[snapshot_id]
WHERE instance_name LIKE '%\DEV'
AND DU.[database_name] NOT LIKE 'tempdb'
AND DU.[database_name] NOT LIKE 'master'
AND DU.[database_name] NOT LIKE 'model'
AND DU.[database_name] NOT LIKE 'msdb'
GROUP BY SII.instance_name
ORDER BY SII.instance_name ,
DU.database_name ,
DU.collection_time;
Returns the following:
2015-08-09 23:30:02.0000000 +00:00 665600 64960 484170
2015-08-10 05:30:02.0000000 +00:00 665600 64960 484170
2015-08-10 11:30:02.0000000 +00:00 665600 64960 484170
2015-08-10 17:30:02.0000000 +00:00 665600 64960 484170
2015-08-10 23:30:04.0000000 +00:00 665600 64960 484170
2015-08-11 05:30:03.0000000 +00:00 665600 64960 484170
2015-08-11 11:30:02.0000000 +00:00 665600 64960 484170
2015-08-11 12:30:28.0000000 +00:00 665600 64960 484170
2015-08-11 17:30:03.0000000 +00:00 665600 64960 484170
2015-08-11 23:30:02.0000000 +00:00 665600 64960 484170
2015-08-12 05:30:02.0000000 +00:00 665600 64960 484170
2015-08-12 11:30:02.0000000 +00:00 665600 64960 484170
I would like to obtain:
2015-08-09 23:30:02.0000000 +00:00 665600 64960 484170
2015-08-10 23:30:04.0000000 +00:00 665600 64960 484170
2015-08-11 23:30:02.0000000 +00:00 665600 64960 484170
2015-08-12 11:30:02.0000000 +00:00 665600 64960 484170
After some research on other stack posts I was able to find similar questions but the responses still had me returning multiple rows per day. So any suggestions is greatly appreciated.
EX of something i tried:
select collection_time, du.snapshot_id, du.database_name
from snapshots.disk_usage du
where du.collection_time in
(
select max(du.collection_time)
from snapshots.disk_usage du
group by collection_time
)
Best Answer
Quite often you can use row_number() to select only certain rows of the data. so maybe something like this would work: