Sql-server – How to select * if sets of data in two columns matches >= 4 times

sql serversql-server-2012

I'm trying to pull information on repeat dispatches and can't seem to come up with a simple solution to pull my data for a report.

I have the following table:

tblMaster15

 ID | Equipment
----------------
  1 |   Sink
----------------
  2 |   Grill

And another table with location info that I have to join on using ID:

tblMaster15_ABData

 ID | Unit | Region
--------------------
  1 | 600  |  6000
--------------------
  2 | 601  |  7000

And I'm trying to produce a list of the ID, Unit, and Equipment but only when the Unit and Equipment are the same more than 3 times in a given time period.

I've tried the following but it seems to be giving me everything, not just where the count is > 3.

select   m.mrid, a.unit, a.Region, m.Equipment
from     MASTER36 m
join     MASTER36_ABDATA a
on       m.mrid = a.mrid
, (select   a.unit, m.Equipment
   from     MASTER36 m
   join     MASTER36_ABDATA a
   on       m.mrid = a.mrid
   group by a.unit, m.Equipment 
   having   count(*) > 3
  ) T2
where    a.unit = t2.unit
and      m.Equipment = t2.Equipment 
and      mrSUBMITDATE between '2/7/17' and '3/7/17'
and      mrstatus <> 'DELETED'
and      mrstatus <> 'Canceled'
and      m.Equipment  <> 'BLDNG

I did not think you would need status or submit date. The status is closed or open. Submit date in this format: 2017-01-01 00:50:24.000

I am using SQL Server 2012.

Best Answer

You are counting the occurrences of Unit and Equipment across the entire dataset, not specifically within the period, in which some combinations may well occur more than 3 times while others fewer times or never.

Therefore, you need to apply the same date range filter to the derived table to make the results reflect your requirements:

select   m.mrid, a.unit, a.Region, m.Equipment
from     MASTER36 m
join     MASTER36_ABDATA a
on       m.mrid = a.mrid
join     (select   a.unit, m.Equipment
           from     MASTER36 m
           join     MASTER36_ABDATA a
           on       m.mrid = a.mrid
           where    mrSUBMITDATE between '2/7/17' and '3/7/17'
           group by a.unit, m.Equipment 
           having   count(*) > 3
         ) t2
on       a.unit = t2.unit
and      m.Equipment = t2.Equipment 
where    mrSUBMITDATE between '2/7/17' and '3/7/17'
and      mrstatus <> 'DELETED'   -- these two can be inlined:
and      mrstatus <> 'Canceled'  -- mrstatus not in ('DELETED', 'Canceled')
and      m.Equipment  <> 'BLDNG';

Note: as you can see, I have also rewritten the join to the derived table to use the explicit join syntax, for consistency.

You could also try this alternative which references each table only once and does not repeat any part of the logic:

select   mrid, unit, Region, Equipment
from     (
           select   m.mrid, a.unit, a.Region, m.Equipment, mrStatus,
                    cnt = count(*) over (partition by a.unit, m.Equipment)
           from     MASTER36 m
           join     MASTER36_ABDATA a
           where    mrSUBMITDATE between '2/7/17' and '3/7/17'
           and      m.Equipment  <> 'BLDNG'
         ) as derived
where    cnt > 3
and      mrstatus not in ('DELETED', 'Canceled');

Instead of counting rows with a normal aggregate function, the query uses a window aggregate function. A window aggregate function lets you return aggregate results along with non-aggregate data.

In the above query you can see that part of the logic is applied directly to the joined tables, at the scope where the rows are counted. Other filters are applied at the outer level, to the derived table, so as not to affect the results of count(*), thus matching the logic of the original query.

The filter on count(*) is also applied outside the derived table, because computed columns cannot be filtered on in the same scope where they are defined and window functions in particular cannot be used in a WHERE clause.

As a final note, please consider these points for maintainability and clarity of your queries: