SQL Server – Selecting Distinct Rows but Counting All Rows

sql serversql-server-2012

I have the following table:

----------------------------------------------
| ID | interestingData |      timestamp      |
----------------------------------------------
|  1 |       400       | 2016-01-23 17:01:00 |
----------------------------------------------
|  1 |       400       | 2016-01-24 17:01:00 |
----------------------------------------------
|  1 |       350       | 2016-01-25 17:01:00 |
----------------------------------------------
|  2 |       23        | 2016-01-23 17:01:00 | 
----------------------------------------------
|  2 |       34        | 2016-01-24 17:01:00 | 
----------------------------------------------
|  2 |       12        | 2016-01-25 17:01:00 | 
----------------------------------------------

Where our PK is (ID, timestamp). I'm attempting to determine a query that will give me the unique IDs and the latest interestingData for which interestingData exceeds a threshold. That would, of course, be done with:

SELECT DISTINCT ID
FROM table
WHERE interestingData > threshold
ORDER BY timestamp DESC;

However, I want the count of every occurrence where interestingData exceeded the threshold. My results table would ideally look like

------------------------------------------------------
| ID | interestingData | timestamp           | count |    
------------------------------------------------------
|  1 |        350      | 2016-01-25 17:01:00 |   3   |
------------------------------------------------------

Were my threshold 300. I am aware that if you want to pair something distinct with a set of data then a left outer join is going to be in order, but I'm not entirely sure how to go about it. This is the closest I can think of so far.

SELECT DISTINCT ID
FROM table t1
LEFT OUTER JOIN table t2 ON t1.ID = t2.table.ID
WHERE interestingData > 300
ORDER BY timestamp DESC

This gets me the distinct IDs and pairs them with the rest of the data as I need, but no provisions for getting the other parts of the results, let alone the count.

Best Answer

If you want the interestingData and timestamp from the same row (the most recent row that exceeds the threshold), and if you want to include all rows that exceed the threshold even if some rows for that ID don't meet the threshold, then:

;WITH x AS 
(
  SELECT ID, interestingData, [timestamp], 
    [count] = COUNT(1) OVER (PARTITION BY ID),
    rn = ROW_NUMBER() 
      OVER (PARTITION BY ID ORDER BY [timestamp] DESC)
  FROM dbo.tablename
  WHERE interestingData > 300
)
SELECT ID, interestingData, [timestamp], [count]
  FROM x
  WHERE rn = 1;

Also, try to avoid data types and/or reserved keywords as column names. timestamp is not a great choice because (a) it's not very meaningful and (b) it requires square brackets in a lot of scenarios.

Related Solutions

Does Detach/Attach or Offline/Online Clear Buffer Cache for Database?

I initially thought you were on to something here. Working assumption was along the lines that perhaps the buffer pool wasn't immediately flushed as it requires "some work" to do so and why bother until the memory was required. But...

Your test is flawed.

What you're seeing in the buffer pool is the pages read as a result of re-attaching the database, not the remains of the previous instance of the database.

And we can see that the buffer pool was not totally blown away by the detach/attach. Seems like my buddy was wrong. Does anyone disagree or have a better argument?

Yes. You're interpreting physical reads 0 as meaning there were not any physical reads

Table 'DatabaseLog'. Scan count 1, logical reads 782, physical reads 0, read-ahead reads 768, lob logical reads 94, lob physical reads 4, lob read-ahead reads 24.

As described on Craig Freedman's blog the sequential read ahead mechanism tries to ensure that pages are in memory before they're requested by the query processor, which is why you see zero or a lower than expected physical read count reported.

When SQL Server performs a sequential scan of a large table, the storage engine initiates the read ahead mechanism to ensure that pages are in memory and ready to scan before they are needed by the query processor. The read ahead mechanism tries to stay 500 pages ahead of the scan.

None of the pages required to satisfy your query were in memory until read-ahead put them there.

As to why online/offline results in a different buffer pool profile warrants a little more idle investigation. @MarkSRasmussen might be able to help us out with that next time he visits.

SQL Server – How to Query Transfers for Single Source to Single Destination

I don't know all of your source data (or why there isn't any type of unique constraint that would prevent full-on duplicates or a source with multiple destinations), but given only the sample data supplied:

;WITH s AS 
(
  -- first let's eliminate duplicates
  SELECT DISTINCT Source, Destination 
    FROM dbo.MyTable
)
SELECT Source, Destination
FROM s
WHERE NOT EXISTS
(
  SELECT 1 FROM s AS d WHERE 

  -- eliminate chains in either direction:
    d.Destination = s.Source OR d.Source = s.Destination

  -- eliminate any source with multiple destinations:
    OR (d.Source = s.Source AND d.Destination <> s.Destination)

  -- eliminate any destination with more than one source
    OR (d.Destination = s.Destination AND d.Source <> s.Source)
);

SQL fiddle demo

Best Answer

Related Solutions

Does Detach/Attach or Offline/Online Clear Buffer Cache for Database?

SQL Server – How to Query Transfers for Single Source to Single Destination

Related Question