I have a table named as Team where table contain 3 column (ID,Team,Task). Its datatypes are (INT,VARCHAR(255),INT)
. Below you will find a sample table where I would like to perform the query on column team.
-----------------------------
|ID| Team |Task |
-----------------------------
|1 |team1,team2,team5 |23 |
|2 |team3,team2 |34 |
|3 |team1,team5,team3 |45 |
|4 |team2 |2 |
|5 |team1,team3 |65 |
|6 |team2,team5 |4 |
|7 |team1 |34 |
|8 |team5 |67 |
-----------------------------
My question is that i like to check whether value in column team is subset of any of the value in same column team (It should not check with it own value in the column team), if that value comes out to be true then i would like to delete that row which has that value.It should run for all values of column team.
OUTPUT table will be like this
---------------------------
|ID| Team |Task|
---------------------------
|1 |team1,team2,team5 |23 |
|2 |team3,team2 |34 |
|3 |team1,team5,team3 |45 |
----------------------------
As you can see in team5 is subset of {team2,team5},{team1,team5,team3} and {team1,team2,team5}
, Same with team1 ,{team2,team5} ,{team1,team3}, team2
they will also get removed because they are subset of some value in column team.
If there is any other approach to do this then do tell me. In initial Table i used group_concat function on it to get all values related to particular ID.
Best Answer
As DBNull pointed out, you are experiencing this challenge because the Team column is breaking First Normal Form. The values in that column should be atomic.
You have two options here. The first is to follow DBNull's recommendation of having the teams registered in their own Team table and having a separate TeamGroup table to store Team Groups. While you're at it you could throw in the third table to store the Tasks that the Team Groups are bound to (trusting they are bound to be changed from time to time and you'd like to know that history).
The second option is to write a script that does pretty much what you'd be doing in the first option except it reflects the change back to the table in the de-normalized format.
This isn't trivial. You'll need to loop over each row, split the column into the atomic team names and then query the table using a FIND_IN_SET() for each team name plus ensuring not to match the row your cursor is pointing to. Every time the table changes you'll need to re-run the query.
Here is some SQL that demonstrates normalizing the data. It has foreign keys for data consistency. The task IDs are not stored in the team_group table in this example instead a separate table exists which allows for multiple tasks.