Sql-server – How to check distinct set of column values against multiple sets

sql server

I am trying to group multiple rows; aggregate distinct values in a column (for the group); and then check if that aggregate set is a subset of some other set. I spent several hours researching and testing different queries, but in the end I am still stuck!

I have put together a test example that I hope illustrates the setup:

create table testtable (
      BusinessDate date
    , GroupId varchar(10)
    , TypeCode varchar(10)
); 

insert into testtable values ('2019-10-15','ABC','T1');
insert into testtable values ('2019-10-15','ABC','T1');  -- TypeCode is not unique for a GroupId
insert into testtable values ('2019-10-15','ABC','T2');
insert into testtable values ('2019-10-15','DEF','T1');
insert into testtable values ('2019-10-15','GHI','T1');
insert into testtable values ('2019-10-15','GHI','T2');
insert into testtable values ('2019-10-15','GHI','T3');
insert into testtable values ('2019-10-15','JKL','T2');
insert into testtable values ('2019-10-15','JKL','T2');
insert into testtable values ('2019-10-15','JKL','T4');

The SuperSets that I want to test against are defined like this (I would prefer to encode these details in the query itself, but could put them in a table if absolutely necessary:

      select 'Type1' as ClassificationType, t.TypeCode from ( values ('T1'),('T2'),('T3') ) t(TypeCode)
union select 'Type2' as ClassificationType, t.TypeCode from ( values ('T1'),('T3') ) t(TypeCode)
union select 'Type3' as ClassificationType, t.TypeCode from ( values ('T2'),('T4'),('T5'),('T6') ) t(TypeCode)

I would like to write a query that extracts unique BusinessDate, GroupId, and TypeCode. It then checks the set of TypeCode (for each BusinessDate, GroupId) against each of the SuperSets defined above (Type1, Type2, Type3). If all the TypeId are found in a SuperSet, it returns that Type.

The SuperSet can have more values than in the set – that's OK. But all the values in the set must be contained in that SuperSet for it to be a match.

The desired output, manually generated for illustration:

      select '2019-10-15' as BusinessDate, 'ABC' as GroupId, 'Type1' as ClassificationType
union select '2019-10-15' as BusinessDate, 'DEF' as GroupId, 'Type1' as ClassificationType
union select '2019-10-15' as BusinessDate, 'GHI' as GroupId, 'Type1' as ClassificationType
union select '2019-10-15' as BusinessDate, 'JKL' as GroupId, 'Type3' as ClassificationType

I expect I would need to use INTERSECT to check the subset contains, but I am not sure how exactly. And I am not sure how to generalize the query to check for all GroupId at the same time.

I'd be grateful for some direction, please!

Here's a horrible query I wrote with all kinds of duplicate code and just checking for one GroupId:

-- variables to allow easy change of inputs for testing
declare @businessdate date = '2019-10-15';
declare @groupid varchar(10) = 'JKL' --'GHI' --'DEF' --'ABC'


select

  @businessDate BusinessDate
, @groupId GroupId

, case

    when (

        select count(*) from
            (
                select distinct TypeCode
                from testtable
                where BusinessDate = @businessdate
                and GroupId = @groupid  
            ) a
        )

        =

        (
            select count(*) from (

                -- duplicate from above
                select distinct TypeCode
                from testtable
                where BusinessDate = @businessdate
                and GroupId = @groupid 

                INTERSECT

                -- DEFINITION OF TYPE 1
                select t.TypeCode
                from ( values ('T1'),('T2'),('T3') ) t(TypeCode)

            ) b
    ) then 'Type1'

    when (

        select count(*) from
            (
                select distinct TypeCode
                from testtable
                where BusinessDate = @businessdate
                and GroupId = @groupid  
            ) a
        )

        =

        (
            select count(*) from (

                -- duplicate from above
                select distinct TypeCode
                from testtable
                where BusinessDate = @businessdate
                and GroupId = @groupid 

                INTERSECT

                -- DEFINITION OF TYPE 2
                select t.TypeCode
                from ( values ('T1'),('T3') ) t(TypeCode)

            ) b
    ) then 'Type2'

    when (

        select count(*) from
            (
                select distinct TypeCode
                from testtable
                where BusinessDate = @businessdate
                and GroupId = @groupid  
            ) a
        )

        =

        (
            select count(*) from (

                -- duplicate from above
                select distinct TypeCode
                from testtable
                where BusinessDate = @businessdate
                and GroupId = @groupid 

                INTERSECT

                -- DEFINITION OF TYPE 3
                select t.TypeCode
                from ( values ('T2'),('T4'),('T5'),('T6') ) t(TypeCode)

            ) b
    ) then 'Type3'

end as ClassificationType

Best Answer

I think you want to pursue it like this:

    with part1 as (
    select groupid, count(distinct typecode) as typecount from testtable
    group by groupid),

    part2 as (
    select groupid, count(distinct typecode) as typecount
    from yourdata
    group by groupid)

    select p1.groupid, p1.typecount
    from part1 p1
    inner join part2 p2 on p1.groupid = p2.groupid and p1.typecount = p2.typecount

Try rewriting your query as 2 datasets, then see if they match.