SQL Server – How to Select 1 Row Per Day for Each Database

sql serversql-server-2012t-sql

I have a table with over 60 million rows. Snapshots taken at multiple times a day for all databases across all instances. I am trying to reduce the rows in my select by only returning 1 snapshot per database/instance per day.

My current query using (SQL Server 2012):

SELECT  instance_name ,
        DU.database_name ,
        DU.collection_time ,
        DU.dbsize ,
        DU.logsize ,
        DU.reservedpages ,
        DU.usedpages
FROM    [core].[snapshots_internal] AS [SI]
LEFT JOIN [core].[source_info_internal] AS [SII] ON [SI].[source_id] = [SII].[source_id]
INNER JOIN [core].[snapshot_timetable_internal] AS [STI] ON [SI].[snapshot_time_id] = [STI].[snapshot_time_id]
LEFT JOIN [snapshots].[disk_usage] AS [DU] ON [SI].[snapshot_id] = [DU].[snapshot_id]
WHERE   instance_name LIKE '%\DEV'
        AND DU.[database_name] NOT LIKE 'tempdb'
        AND DU.[database_name] NOT LIKE 'master'
        AND DU.[database_name] NOT LIKE 'model'
        AND DU.[database_name] NOT LIKE 'msdb'
GROUP BY SII.instance_name
ORDER BY SII.instance_name ,
         DU.database_name ,
         DU.collection_time;

Returns the following:

2015-08-09 23:30:02.0000000 +00:00  665600  64960   484170
2015-08-10 05:30:02.0000000 +00:00  665600  64960   484170
2015-08-10 11:30:02.0000000 +00:00  665600  64960   484170
2015-08-10 17:30:02.0000000 +00:00  665600  64960   484170
2015-08-10 23:30:04.0000000 +00:00  665600  64960   484170
2015-08-11 05:30:03.0000000 +00:00  665600  64960   484170
2015-08-11 11:30:02.0000000 +00:00  665600  64960   484170
2015-08-11 12:30:28.0000000 +00:00  665600  64960   484170
2015-08-11 17:30:03.0000000 +00:00  665600  64960   484170
2015-08-11 23:30:02.0000000 +00:00  665600  64960   484170
2015-08-12 05:30:02.0000000 +00:00  665600  64960   484170
2015-08-12 11:30:02.0000000 +00:00  665600  64960   484170

I would like to obtain:

2015-08-09 23:30:02.0000000 +00:00  665600  64960   484170
2015-08-10 23:30:04.0000000 +00:00  665600  64960   484170
2015-08-11 23:30:02.0000000 +00:00  665600  64960   484170
2015-08-12 11:30:02.0000000 +00:00  665600  64960   484170

After some research on other stack posts I was able to find similar questions but the responses still had me returning multiple rows per day. So any suggestions is greatly appreciated.

EX of something i tried:

select collection_time, du.snapshot_id, du.database_name 
from   snapshots.disk_usage du 
where du.collection_time in 
(
   select max(du.collection_time) 
   from snapshots.disk_usage du 
   group by collection_time
)

Best Answer

Quite often you can use row_number() to select only certain rows of the data. so maybe something like this would work:

select * from (
  SELECT  
    instance_name ,
    DU.database_name ,
    DU.collection_time ,
    DU.dbsize ,
    DU.logsize ,
    DU.reservedpages ,
    DU.usedpages,
    row_number() over (
      partition by instance_name, DU.database_name, convert(date, DU.collection_time)
      order by DU.collection_time desc) as RN
    ...
) X
WHERE RN = 1
ORDER BY instance_name ,
         database_name ,
         collection_time;