I want to find all the groups in a table that have duplicate group version.
A group can have multiple group versions.
Each group version can have multiple members.
A group 'version' is defined by grpid
and changeDate
.
A group consider a duplicate if ALL the members (userid
, pct
and hobby
) in one group version matches another group version within the same group.
https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=84eb81a1a71dcee9ad3d0bd91f56120a
The groups
table:
id | grpid | changeDate | userid | pct | hobby | group version* |
---|---|---|---|---|---|---|
1 | 1 | 2020-01-01 | 1 | 1 | 1 | 1 |
2 | 1 | 2020-01-02 | 1 | 1 | 2 | 2 |
3 | 1 | 2020-01-03 | 1 | 1 | 1 | 3 |
4 | 2 | 2020-01-01 | 1 | 0.5 | 1 | 4 |
5 | 2 | 2020-01-01 | 2 | 0.5 | 2 | 4 |
6 | 2 | 2020-01-02 | 1 | 0.5 | 1 | 5 |
7 | 2 | 2020-01-02 | 2 | 0.5 | 3 | 5 |
8 | 3 | 2020-01-01 | 1 | 0.5 | 1 | 6 |
9 | 3 | 2020-01-01 | 2 | 0.5 | 2 | 6 |
10 | 3 | 2020-01-02 | 1 | 0.4 | 1 | 7 |
11 | 3 | 2020-01-02 | 2 | 0.6 | 2 | 7 |
12 | 4 | 2020-01-01 | 1 | 0.6 | 1 | 8 |
13 | 4 | 2020-01-01 | 2 | 0.4 | 2 | 8 |
14 | 4 | 2020-01-02 | 1 | 0.6 | 1 | 9 |
15 | 4 | 2020-01-02 | 2 | 0.4 | 2 | 9 |
16 | 5 | 2020-01-01 | 1 | 0.2 | 2 | 10 |
17 | 5 | 2020-01-01 | 2 | 0.5 | 1 | 10 |
18 | 5 | 2020-01-01 | 3 | 0.3 | 2 | 10 |
19 | 6 | 2020-01-01 | 1 | 0.3 | 2 | 11 |
20 | 6 | 2020-01-01 | 2 | 0.5 | 1 | 11 |
21 | 6 | 2020-01-01 | 3 | 0.2 | 2 | 11 |
22 | 6 | 2020-02-01 | 1 | 0.2 | 2 | 12 |
23 | 6 | 2020-02-01 | 2 | 0.5 | 1 | 12 |
24 | 6 | 2020-02-01 | 3 | 0.3 | 2 | 12 |
25 | 6 | 2020-03-01 | 1 | 0.3 | 2 | 13 |
26 | 6 | 2020-03-01 | 2 | 0.3 | 1 | 13 |
27 | 6 | 2020-03-01 | 3 | 0.4 | 2 | 13 |
28 | 7 | 2020-01-01 | 1 | 0.3 | 2 | 14 |
29 | 7 | 2020-01-01 | 2 | 0.5 | 1 | 14 |
30 | 7 | 2020-01-01 | 3 | 0.2 | 2 | 14 |
31 | 7 | 2020-02-01 | 1 | 0.3 | 2 | 15 |
32 | 7 | 2020-02-01 | 2 | 0.5 | 1 | 15 |
33 | 7 | 2020-02-01 | 3 | 0.2 | 2 | 15 |
34 | 7 | 2020-03-01 | 1 | 0.3 | 2 | 16 |
35 | 7 | 2020-03-01 | 2 | 0.3 | 1 | 16 |
36 | 7 | 2020-03-01 | 3 | 0.4 | 2 | 16 |
37 | 8 | 2020-02-01 | 1 | 0.3 | 1 | 17 |
38 | 8 | 2020-03-01 | 1 | 0.3 | 1 | 18 |
39 | 8 | 2020-03-01 | 3 | 0.4 | 2 | 18 |
*Unique group version number just for visualization.
Result should be:
grpid
1
4
7
Explanation:
- grpid 1 – there are 3 group versions (of 1 member) – 1 and 3 duplicate because userid, pct and hobby equal
- grpid 2 – there are 2 group versions (of 2 members) – not duplicate because hobby not equal between 5 and 7
- grpid 3 – there are 2 group versions (of 2 members) – not duplicate because pct is different among all members
- grpid 4 – there are 2 group versions (of 2 members) – all members are duplicate because userid, pct and userid equal
- grpid 5 – there is only one group of 3 members – not duplicate
- grpid 6 – there are 3 group versions (of 3 members) – not duplicate – the pct changed for each member in the group between versions
- grpid 7 – there are 3 group versions (of 3 members) – duplicate because userid, pct and userid equal between 28-30 and 31-33
- grpid 8 – there are 2 group versions one w/ one member and one with 2 members – not duplicate because there is another member in that group
I am using MySQL 5.7.
Best Answer
Hopefully this should provide an answser:
This implements "common" relational division:
Result:
This implements exact relational division, which results in even more complicated code. Your case, where the groupings/versions are determined by two columns
(grpid, changeDate)
makes it look like even more complicated.Tested in dbffdle.uk
Query 1:
Query 2 that uses MySQL's
GROUP_CONCAT
function: