Sql-server – the best way to check for matching sets of rows

sql serversql-server-2008-r2

I was watching this video yesterday on how unlikely it is to shuffle a deck of cards the same way twice:

https://youtu.be/czSdEhtXwm4?t=300

And I thought to myself "My SAN adminsitrator still has some hair left."

So I built a quick deck of cards:

        declare @card int
        create table deck (card int)

        set @card = 1
        while @card <= 52
        begin
        insert into deck (card)
        values (@card)
        set @card += 1
        end

Shuffled them up:

        declare @shuffleID bigint
        create table shuffles (shuffleID bigint, position int, card int)

        select @shuffleID = isnull(max(shuffleid),0)+ 1 from shuffles

        while @shuffleID <= 10000

        begin

        insert into shuffles (shuffleID, position, card)
        select @shuffleID, rank() over (order by newid()), card from deck
        order by newid()

        set @shuffleID += 1

        end

Then saw how closely I could match them:

        select s1.shuffleID, s2.shuffleID, count(1) as match_count from shuffles s1
        inner join shuffles s2 on s1.position = s2.position and s1.card = s2.card 
            and s1.shuffleID != s2.shuffleID
        group by  s1.shuffleID, s2.shuffleID
        order by count(1) desc

The whole script looks like this:

        if object_id('dbo.deck') is not null begin drop table deck end
        if object_id('dbo.shuffles') is not null begin drop table shuffles end

        create table deck (card int)
        create table shuffles (shuffleID bigint, position int, card int)
        CREATE NONCLUSTERED INDEX [huehuehue]
        ON [dbo].[shuffles] ([position],[card])
        INCLUDE ([shuffleID])

        declare @card int,
        @shuffleID bigint,
        @sql int
        set @card = 1
        while @card <= 52
        begin
        insert into deck (card)
        values (@card)
        set @card += 1
        end

        select @shuffleID = isnull(max(shuffleid),0)+ 1 from shuffles

        while @shuffleID <= 1000

        begin

        insert into shuffles (shuffleID, position, card)
        select @shuffleID, rank() over (order by newid()), card from deck
        order by newid()

        set @shuffleID += 1

        end

        select s1.shuffleID, s2.shuffleID, count(1) as match_count from shuffles s1
        inner join shuffles s2 on s1.position = s2.position and s1.card = s2.card 
            and s1.shuffleID != s2.shuffleID
        group by  s1.shuffleID, s2.shuffleID
        order by count(1) desc

The highest match I've got out of 1,000 shuffles is 8.

Question: What would be the fastest way to shuffle a deck of cards the same way twice?

Best Answer

What is the best way to check for matching sets of rows?

Generally, inner join on a key

Probability of of the shuffling being the exact same?

A big number as described in a few places

1 in 806,581,751,709,438,785,716,606,368,564,037,669,752,895,054,408,832,778,240,000,000,000

What would be the fastest way to shuffle a deck of cards the same way twice?

One way would be to just use a single table. I use a tally table for the initial insert to speed that up, and also use two CTE's for the shuffle. Is it the fastest? Not sure, but it's pretty quick!

drop table deck;
create table deck ([card] int primary key, pos int);

WITH
    E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
cteTally(N) AS 
    (
        SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E2
    )  

insert into deck (card,pos)
select   card = N
        ,pos = rank() over (order by newid())
from cteTally where N < 53;

;with shuffle1 as(
select card 
       ,pos =rank() over (order by newid())
from deck),

shuffle2 as(
select card 
       ,pos =rank() over (order by newid())
from deck)

select s1.card, s1.pos as s1, s2.pos as s2
from shuffle1 s1 
inner join shuffle2 s2 on s1.card = s2.card
where s1.pos = s2.pos

Takes about 4 milliseconds on my machine for the initial and follow on shuffles, on average.

STATISTICS TIME FOR ENTIRE SCRIPT

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 2 ms.

 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 2 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 4 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

(52 row(s) affected)
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 4 ms.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Totals