SQL Server – Generate Unique ID Based on Multiple Rows Criteria

sql serversql-server-2008-r2t-sql

I've got the following scenario, I've got a table like this

| pid | itemId |
|-----|--------|
|123  | item1  |
|123  | item2  |
|234  | item1  |
|234  | item3  |
|456  | item1  |
|456  | item2  |
|567  | item1  |
|567  | item2  |
|567  | item3  |

I need to get a unique integer id based on the unique combination of items. So if two persons have chosen the same items, the unique combination should be the same number.

For example:

| pid | itemId | comb |
|-----|--------|------|
|123  | item1  |  1   |
|123  | item2  |      |
|234  | item1  |  2   |
|234  | item3  |      |
|456  | item1  |  1   |
|456  | item2  |      |
|567  | item1  |  3   |
|567  | item2  |      |
|567  | item3  |      |

The number of possible combinations at the moment are around 300. So when filtering it might be easier to filter by just one number in one column than filtering with multiple criteria.

Best Answer

One way to approach this problem is to concatenate all of the item values together for each PID and to assign a value to them using the DENSE_RANK window function. Unfortunately, SQL Server does not make it easy to do string concatenation at an aggregate level until STRING_AGG() in SQL Server vNext. Since you're on SQL Server 2008 I'm going to use the FOR XML path method of string aggregation. An explanation of that method along with other ways of doing it can be found in Grouped Concatenation in SQL Server.

Data prep using your sample data:

CREATE TABLE #X165719 (
PID INTEGER NOT NULL,
ITEMID VARCHAR(10) NOT NULL
);

INSERT INTO #X165719
VALUES 
(123, 'item1'),
(123, 'item2'),
(234, 'item1'),
(234, 'item3'),
(456, 'item1'),
(456, 'item2'),
(567, 'item1'),
(567, 'item2'),
(567, 'item3');

First let's do the string aggregation. One implementation is as follows:

SELECT 
  PID
, ITEMID
, (
    SELECT ',' + SUB.ITEMID
    FROM #X165719 SUB
    WHERE SUB.PID = t1.PID
    ORDER BY SUB.ITEMID
    FOR XML PATH (''), TYPE).value('.', 'varchar(max)') all_items
FROM #X165719 t1

You should pick a delimiter that doesn't appear in your source data if possible. I used a comma. Here is what the result set looks like:

╔═════╦════════╦════════════════════╗
║ PID ║ ITEMID ║     all_items      ║
╠═════╬════════╬════════════════════╣
║ 123 ║ item1  ║ ,item1,item2       ║
║ 123 ║ item2  ║ ,item1,item2       ║
║ 234 ║ item1  ║ ,item1,item3       ║
║ 234 ║ item3  ║ ,item1,item3       ║
║ 456 ║ item1  ║ ,item1,item2       ║
║ 456 ║ item2  ║ ,item1,item2       ║
║ 567 ║ item1  ║ ,item1,item2,item3 ║
║ 567 ║ item2  ║ ,item1,item2,item3 ║
║ 567 ║ item3  ║ ,item1,item2,item3 ║
╚═════╩════════╩════════════════════╝

Now we need to assign a different number for each unique value in the all_items column. One way to accomplish this is with the DENSE_RANK function. Quoting from BOL:

Returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question.

The final query is:

SELECT 
  t2.PID
, t2.ITEMID
, DENSE_RANK() OVER (ORDER BY all_items) COMB
FROM
(
    SELECT 
      PID
    , ITEMID
    , (
        SELECT ',' + SUB.ITEMID
        FROM #X165719 SUB
        WHERE SUB.PID = t1.PID
        ORDER BY SUB.ITEMID
        FOR XML PATH (''), TYPE).value('.', 'varchar(max)') all_items
    FROM #X165719 t1
) t2;

The final result set is:

╔═════╦════════╦══════╗
║ PID ║ ITEMID ║ COMB ║
╠═════╬════════╬══════╣
║ 123 ║ item1  ║    1 ║
║ 123 ║ item2  ║    1 ║
║ 456 ║ item1  ║    1 ║
║ 456 ║ item2  ║    1 ║
║ 567 ║ item1  ║    2 ║
║ 567 ║ item2  ║    2 ║
║ 567 ║ item3  ║    2 ║
║ 234 ║ item1  ║    3 ║
║ 234 ║ item3  ║    3 ║
╚═════╩════════╩══════╝

Related Solutions

SQL Server – How to Dynamically Generate Rows Based on Column Value

This is an example of when a "Number" table becomes very handy! If you are not familiar with this concept, read this blog post from Adam Machanic: You REQUIRE a Numbers Table

The numbers table is used to join to the Appointment Slots column (in this case) to duplicate each row AppointmentSlots number of times.

-- A simple recreation of your CTE data...
CREATE  TABLE #AppointmentPeriods
(
    AppointmentLocation CHAR(1),
    AppointmentStartTime TIME,
    AppointmentEndTime TIME,
    AppointmentSlotsAvailable SMALLINT
)

INSERT  INTO #AppointmentPeriods VALUES
    ('A', '09:00', '09:15', 2),
    ('A', '09:15', '09:30', 2),
    ('A', '09:30', '09:45', 2),
    -- ....
    ('A', '16:30', '16:45', 2),
    ('A', '16:45', '17:00', 2),
    ('B', '09:00', '09:15', 1),
    ('B', '09:15', '09:30', 1),
    ('B', '09:30', '09:45', 1),
    -- ....
    ('B', '16:30', '16:45', 1),
    ('B', '16:45', '17:00', 1)


-- Numbers table
CREATE  TABLE #Numbers (NumberValue SMALLINT)

INSERT  INTO #Numbers
-- TOP value should be changed so it is greater than the
-- maximum number of potential appointment slots any location can have
SELECT  TOP 20
        ROW_NUMBER() OVER (ORDER BY object_id)
FROM    sys.objects


SELECT  #AppointmentPeriods.*, NumberValue AS AppointmentSlotNumber
FROM    #AppointmentPeriods
        INNER JOIN #Numbers
            ON AppointmentSlotsAvailable >= NumberValue

Query output:

Location | StartTime | EndTime | AppointmentSlots | SlotNumber
A        | 09:00     | 09:15   | 2                | 1
A        | 09:00     | 09:15   | 2                | 2
A        | 09:15     | 09:30   | 2                | 1
A        | 09:15     | 09:30   | 2                | 2
B        | 16:30     | 16:45   | 1                | 1
B        | 16:45     | 17:00   | 1                | 1

PS: There are many ways to generate a numbers table: SO: What is the best way to create and populate a numbers table?

Sql-server – Queries for unique names from a site table using multiple criteria

1 Select unique site names which have been referred to at some point.

Get rows from site where site_id exists in referrals. This assumes site name is unique in site. If not you can add a distinct.

select s.name
from dbo.site as s
where s.deleted = 0 and
      exists (
             select *
             from dbo.referrals as r
             where s.site_id = r.site_id
             )

2 Select unique site names and how many times they have been referred to.

Join site to referrals and count the number of rows grouped by site name.

select s.name, 
       count(*) as times_referred
from dbo.site as s
  inner join dbo.referrals as r
    on s.site_id = r.site_id
where s.deleted = 0
group by s.name

3 Select unique site names and the last date they were referred to.

Add a join to calls and get the max call_date for each site name.

select s.name, 
       count(*) as times_referred,
       max(c.call_date) as last_call_date
from dbo.site as s
  inner join dbo.referrals as r
    on s.site_id = r.site_id
  inner join dbo.calls as c
    on r.call_id = c.call_id
where s.deleted = 0 and
      c.deleted = 0
group by s.name

4 Select unique site names which have never been referred to.

Same as first query only you get the sites that does not exist in referrals.

select s.name
from dbo.site as s
where s.deleted = 0 and
      not exists (
                 select *
                 from dbo.referrals as r
                 where s.site_id = r.site_id
                 )

Best Answer

Related Solutions

SQL Server – How to Dynamically Generate Rows Based on Column Value

Sql-server – Queries for unique names from a site table using multiple criteria

Related Question