MySQL-How to find that value in a column is subset with all other values in same column multiple time

MySQLstored-procedures

I have a table named as Team where table contain 3 column (ID,Team,Task). Its datatypes are (INT,VARCHAR(255),INT). Below you will find a sample table where I would like to perform the query on column team.

   -----------------------------
   |ID|        Team      |Task |
   -----------------------------
   |1 |team1,team2,team5 |23   |
   |2 |team3,team2       |34   |
   |3 |team1,team5,team3 |45   |
   |4 |team2             |2    |
   |5 |team1,team3       |65   |
   |6 |team2,team5       |4    |
   |7 |team1             |34   |
   |8 |team5             |67   |
   ----------------------------- 

My question is that i like to check whether value in column team is subset of any of the value in same column team (It should not check with it own value in the column team), if that value comes out to be true then i would like to delete that row which has that value.It should run for all values of column team.

OUTPUT table will be like this

   ---------------------------
   |ID|    Team          |Task|
   ---------------------------
   |1 |team1,team2,team5 |23  |
   |2 |team3,team2       |34  |
   |3 |team1,team5,team3 |45  |
   ----------------------------

As you can see in team5 is subset of {team2,team5},{team1,team5,team3} and {team1,team2,team5}, Same with team1 ,{team2,team5} ,{team1,team3}, team2 they will also get removed because they are subset of some value in column team.

If there is any other approach to do this then do tell me. In initial Table i used group_concat function on it to get all values related to particular ID.

Best Answer

As DBNull pointed out, you are experiencing this challenge because the Team column is breaking First Normal Form. The values in that column should be atomic.

You have two options here. The first is to follow DBNull's recommendation of having the teams registered in their own Team table and having a separate TeamGroup table to store Team Groups. While you're at it you could throw in the third table to store the Tasks that the Team Groups are bound to (trusting they are bound to be changed from time to time and you'd like to know that history).

The second option is to write a script that does pretty much what you'd be doing in the first option except it reflects the change back to the table in the de-normalized format.

This isn't trivial. You'll need to loop over each row, split the column into the atomic team names and then query the table using a FIND_IN_SET() for each team name plus ensuring not to match the row your cursor is pointing to. Every time the table changes you'll need to re-run the query.

Here is some SQL that demonstrates normalizing the data. It has foreign keys for data consistency. The task IDs are not stored in the team_group table in this example instead a separate table exists which allows for multiple tasks.

CREATE TABLE `team` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(256) NOT NULL COMMENT 'Name of Team',
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `team_group` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(256) NOT NULL DEFAULT '' COMMENT 'Optional Name of Team Group',
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='a team group can be one or more teams';

CREATE TABLE `team_group_map` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `team_id` int(10) unsigned NOT NULL,
  `group_id` int(10) unsigned NOT NULL,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unq_team_group` (`team_id`,`group_id`),
  KEY `group_id` (`group_id`),
  CONSTRAINT `team_group_map_ibfk_1` FOREIGN KEY (`team_id`) REFERENCES `team` (`id`) ON DELETE CASCADE,
  CONSTRAINT `team_group_map_ibfk_2` FOREIGN KEY (`group_id`) REFERENCES `team_group` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Binds teams to a team group';

CREATE TABLE `task` (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(256) NOT NULL,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `task_group_map` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `task_id` mediumint(8) unsigned NOT NULL,
  `group_id` int(10) unsigned NOT NULL,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unq_group_task` (`group_id`,`task_id`),
  KEY `task_id` (`task_id`),
  CONSTRAINT `task_group_map_ibfk_1` FOREIGN KEY (`task_id`) REFERENCES `task` (`id`) ON DELETE CASCADE,
  CONSTRAINT `task_group_map_ibfk_2` FOREIGN KEY (`group_id`) REFERENCES `team_group` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Binds tasks to a team group';


-- Add a new team
INSERT INTO team (name) VALUES ('Team1');

-- Add a new group and add an existing team to it
INSERT INTO team_group (name) VALUES ('Euro east group');
-- Get ID for this group
SELECT id FROM team_group WHERE name = 'Euro east group';
-- Get team's ID
SELECT id FROM team WHERE name = 'Team1';
-- Add team to the group (team_id = 1, group_id = 1)
INSERT INTO team_group_map (team_id, group_id) VALUES (1, 1);

-- Add a task and set it for a group
INSERT INTO task (name) VALUES ('Win Finals');
-- Get ID for this task
SELECT id FROM task WHERE name = 'Win Finals';
-- Get team's ID
SELECT id FROM team WHERE name = 'Team1';
-- Add task to the group (task_id = 1, group_id = 1)
INSERT INTO task_group_map (task_id, group_id) VALUES (1, 1);

-- Select the teams and group IDs
SELECT team.name, group_id
FROM team_group_map
INNER JOIN team ON (team.id = team_id);

-- Select the teams and group names
SELECT team.name, group.name 
FROM team_group_map
INNER JOIN team ON (team.id = team_id)
INNER JOIN team_group AS `group` ON (group.id = group_id);

-- Select the groups, their task IDs and when the tasks were added to the group
SELECT group.name, task_id, map.updated_at
FROM task_group_map AS map
INNER JOIN team_group AS `group` ON (group.id = group_id);

-- Select the groups and their task Names + when tasks were added to group
SELECT group.name, task.name, map.updated_at
FROM task_group_map AS map
INNER JOIN task ON (task.id = task_id)
INNER JOIN team_group AS `group` ON (group.id = group_id);

-- Select the most recent task for a group
SELECT task.name, map.updated_at
FROM task_group_map AS map
INNER JOIN task ON (task.id = task_id)
WHERE group_id = 1
ORDER BY map.updated_at DESC
LIMIT 1;