Mysql – Compare two columns and replace duplicates with nulls

MySQLmysql-5.6

I have a table like this:

id  | t1 | t2
---------------
 1  |  a |  b
 2  |  c |  a
 3  |  a |  e
 4  |  f |  g
 5  |  c |  c

I want to compare columns t1 and t2 with each other to remove matching values and get unique values in each columns, like this:

   t1 | t2
  -----------
    a |  b
    c |  null
 null |  e 
    f |  g
  null| null

It doesn't matter much which row to choose to show the value and which other ones would show nulls. For instance, as value a is also found in row 3 of the original dataset, the following output would also be valid:

   t1 | t2
  -----------
 null |  b
    c |  null
    a |  e 
    f |  g
  null| null

The end goal is to show all distinct values and no identical values across the two columns.

Best Answer

Without window functions this is pretty hard in MySQL. Since the two columns - t1 and t2 - store similar content and you only want disctinct values from both, it would be much easier to get the values in a single column:

select t1 as tx from t
union distinct
select t2 from t ;

I don't see any reason to have the original convoluted result, except if you want to keep info from other columns in the table and just remove duplicates from these 2 columns. And in that case, an UPDATE would make more sense.

Here is a method to get this result, anyway. It assumes that (id) has a UNIQUE constraint. Tested at dbfiddle.uk:

select t.id, gt1.tx as t1, gt2.tx as t2
from t
  left join
  ( select ut.tx, min(ut.id) as id
    from
      ( select id, t1 as tx from t
        union all
        select id, t2 from t
      ) ut 
    group by ut.tx
  ) as gt1
  on t.t1 = gt1.tx and t.id = gt1.id
  left join
  ( select ut.tx, min(ut.id) as id
    from
      ( select id, t1 as tx from t
        union all
        select id, t2 from t
      ) ut 
    group by ut.tx
  ) as gt2
  on t.t2 = gt2.tx and t.id = gt2.id and gt1.tx <> gt2.tx
 ;

Notice that in MariaDB (MySQL's first cousin), that has CTEs, the same query can be rewritten more compactly and more clearly:

with gt as 
  ( select ut.tx, min(ut.id) as id
    from
      ( select id, t1 as tx from t
        union all
        select id, t2 from t
      ) ut 
    group by ut.tx
  ) 
select t.id, gt1.tx as t1, gt2.tx as t2
from t
  left join gt as gt1
    on t.t1 = gt1.tx and t.id = gt1.id
  left join gt as gt2
    on t.t2 = gt2.tx and t.id = gt2.id and gt1.tx <> gt2.tx
 ;

Logically, both variations work the same way, though. This subselect:

  ( select ut.tx, min(ut.id) as id
    from
      ( select id, t1 as tx from t
        union all
        select id, t2 from t
      ) ut 
    group by ut.tx
  ) 

returns all distinct t1 and t2 values along with the ID of the first1 row where each is encountered. For your example, it produces this output:

tx   id
---  ---
a    1
b    1
c    2
e    3
f    4
g    4

The above set is joined against the original table twice, on t1 and on t2. In each case, where the original row's ID matches the subselect row's ID, the value is returned intact, because the match indicates that the row is the value's first occurrence; otherwise the value is replaced with a null.


1“First" in the order of ID.