MySQL-How to find that value in a column is subset with all other values in same column multiple time

MySQLstored-procedures

I have a table named as Team where table contain 3 column (ID,Team,Task). Its datatypes are (INT,VARCHAR(255),INT). Below you will find a sample table where I would like to perform the query on column team.

   -----------------------------
   |ID|        Team      |Task |
   -----------------------------
   |1 |team1,team2,team5 |23   |
   |2 |team3,team2       |34   |
   |3 |team1,team5,team3 |45   |
   |4 |team2             |2    |
   |5 |team1,team3       |65   |
   |6 |team2,team5       |4    |
   |7 |team1             |34   |
   |8 |team5             |67   |
   -----------------------------

My question is that i like to check whether value in column team is subset of any of the value in same column team (It should not check with it own value in the column team), if that value comes out to be true then i would like to delete that row which has that value.It should run for all values of column team.

OUTPUT table will be like this

   ---------------------------
   |ID|    Team          |Task|
   ---------------------------
   |1 |team1,team2,team5 |23  |
   |2 |team3,team2       |34  |
   |3 |team1,team5,team3 |45  |
   ----------------------------

As you can see in team5 is subset of {team2,team5},{team1,team5,team3} and {team1,team2,team5}, Same with team1 ,{team2,team5} ,{team1,team3}, team2 they will also get removed because they are subset of some value in column team.

If there is any other approach to do this then do tell me. In initial Table i used group_concat function on it to get all values related to particular ID.

Best Answer

As DBNull pointed out, you are experiencing this challenge because the Team column is breaking First Normal Form. The values in that column should be atomic.

You have two options here. The first is to follow DBNull's recommendation of having the teams registered in their own Team table and having a separate TeamGroup table to store Team Groups. While you're at it you could throw in the third table to store the Tasks that the Team Groups are bound to (trusting they are bound to be changed from time to time and you'd like to know that history).

The second option is to write a script that does pretty much what you'd be doing in the first option except it reflects the change back to the table in the de-normalized format.

This isn't trivial. You'll need to loop over each row, split the column into the atomic team names and then query the table using a FIND_IN_SET() for each team name plus ensuring not to match the row your cursor is pointing to. Every time the table changes you'll need to re-run the query.

Here is some SQL that demonstrates normalizing the data. It has foreign keys for data consistency. The task IDs are not stored in the team_group table in this example instead a separate table exists which allows for multiple tasks.



CREATE TABLE `team` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(256) NOT NULL COMMENT 'Name of Team',
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `team_group` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(256) NOT NULL DEFAULT '' COMMENT 'Optional Name of Team Group',
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='a team group can be one or more teams';

CREATE TABLE `team_group_map` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `team_id` int(10) unsigned NOT NULL,
  `group_id` int(10) unsigned NOT NULL,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unq_team_group` (`team_id`,`group_id`),
  KEY `group_id` (`group_id`),
  CONSTRAINT `team_group_map_ibfk_1` FOREIGN KEY (`team_id`) REFERENCES `team` (`id`) ON DELETE CASCADE,
  CONSTRAINT `team_group_map_ibfk_2` FOREIGN KEY (`group_id`) REFERENCES `team_group` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Binds teams to a team group';

CREATE TABLE `task` (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(256) NOT NULL,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `task_group_map` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `task_id` mediumint(8) unsigned NOT NULL,
  `group_id` int(10) unsigned NOT NULL,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unq_group_task` (`group_id`,`task_id`),
  KEY `task_id` (`task_id`),
  CONSTRAINT `task_group_map_ibfk_1` FOREIGN KEY (`task_id`) REFERENCES `task` (`id`) ON DELETE CASCADE,
  CONSTRAINT `task_group_map_ibfk_2` FOREIGN KEY (`group_id`) REFERENCES `team_group` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Binds tasks to a team group';


-- Add a new team
INSERT INTO team (name) VALUES ('Team1');

-- Add a new group and add an existing team to it
INSERT INTO team_group (name) VALUES ('Euro east group');
-- Get ID for this group
SELECT id FROM team_group WHERE name = 'Euro east group';
-- Get team's ID
SELECT id FROM team WHERE name = 'Team1';
-- Add team to the group (team_id = 1, group_id = 1)
INSERT INTO team_group_map (team_id, group_id) VALUES (1, 1);

-- Add a task and set it for a group
INSERT INTO task (name) VALUES ('Win Finals');
-- Get ID for this task
SELECT id FROM task WHERE name = 'Win Finals';
-- Get team's ID
SELECT id FROM team WHERE name = 'Team1';
-- Add task to the group (task_id = 1, group_id = 1)
INSERT INTO task_group_map (task_id, group_id) VALUES (1, 1);

-- Select the teams and group IDs
SELECT team.name, group_id
FROM team_group_map
INNER JOIN team ON (team.id = team_id);

-- Select the teams and group names
SELECT team.name, group.name 
FROM team_group_map
INNER JOIN team ON (team.id = team_id)
INNER JOIN team_group AS `group` ON (group.id = group_id);

-- Select the groups, their task IDs and when the tasks were added to the group
SELECT group.name, task_id, map.updated_at
FROM task_group_map AS map
INNER JOIN team_group AS `group` ON (group.id = group_id);

-- Select the groups and their task Names + when tasks were added to group
SELECT group.name, task.name, map.updated_at
FROM task_group_map AS map
INNER JOIN task ON (task.id = task_id)
INNER JOIN team_group AS `group` ON (group.id = group_id);

-- Select the most recent task for a group
SELECT task.name, map.updated_at
FROM task_group_map AS map
INNER JOIN task ON (task.id = task_id)
WHERE group_id = 1
ORDER BY map.updated_at DESC
LIMIT 1;

Related Solutions

MySQL Database Schema for Volleyball

Like Gaius said, review the answers at How-to implement an entity with an unknown maximum number of attributes?

Rewriting for MySQL Syntax from the answer by Eric Humphrey at above link:

You could do this. It would allow good performance for normal duration games, while allowing you to also store long running games.

CREATE TABLE InningRuns (
    GameId INT NOT NULL AUTO_INCREMENT,
    Team CHAR(4) NOT NULL, --'Home','Away'
    Inning1 TINYINT, --Seeing how more than 255 runs are not really possible in an inning
    Inning2 TINYINT,
    [...],
    Inning9 TINYINT,
    ExtraInnings VARCHAR(2000), --Use to hold any runs in extra innings, store multiple values using some delimiter like comma (,), pipe (|), semicolon (;), or anything that is not part of the data.
    PRIMARY KEY (GameId, Team)
)

You could further normalize and have a row for each unique combination of game, team, & inning. This would allow you as many innings as the InningId datatype would allow.

CREATE TABLE InningRuns (
    InningRunId INT NOT NULL AUTO_INCREMENT,
    GameId INT NOT NULL,
    Team CHAR(4) NOT NULL, --'Home','Away'
    InningId TINYINT, --Seeing how more than 255 innings might be excessive
    Runs TINYINT,
    PRIMARY KEY (InningRunId),
    UNIQUE (GameId, Team, InningId)
)

Database Design – Best Logical Structure for Different Entity Categories

I recommend you to read about the dangers of the Entity-Attribute-Value pattern on this presentation by @Bill Karwin.

One of the solutions is storing NoSQL-like data in a serialized BLOB, if you do not need to read and write individual properties. That is, storing a key (product) as the primary key, and the variable property-value pairs all together in a single blob. This is a non-normalized structure, leaving the responsibility of the constraints to the application layer, but it works great for certain patterns (the same that usually work well with NoSQL solutions). Even if you need indexing, you can create inverted indexes on a separate table as needed.

This and other solutions can be seen on this presentation in the latest MySQL Conference, including when you should search for alternative database engines.

Best Answer

Related Solutions

MySQL Database Schema for Volleyball

Database Design – Best Logical Structure for Different Entity Categories

Related Question