Mysql – Compare two columns and replace duplicates with nulls

MySQLmysql-5.6

I have a table like this:

id  | t1 | t2
---------------
 1  |  a |  b
 2  |  c |  a
 3  |  a |  e
 4  |  f |  g
 5  |  c |  c

I want to compare columns t1 and t2 with each other to remove matching values and get unique values in each columns, like this:

   t1 | t2
  -----------
    a |  b
    c |  null
 null |  e 
    f |  g
  null| null

It doesn't matter much which row to choose to show the value and which other ones would show nulls. For instance, as value a is also found in row 3 of the original dataset, the following output would also be valid:

   t1 | t2
  -----------
 null |  b
    c |  null
    a |  e 
    f |  g
  null| null

The end goal is to show all distinct values and no identical values across the two columns.

Best Answer

Without window functions this is pretty hard in MySQL. Since the two columns - t1 and t2 - store similar content and you only want disctinct values from both, it would be much easier to get the values in a single column:

select t1 as tx from t
union distinct
select t2 from t ;

I don't see any reason to have the original convoluted result, except if you want to keep info from other columns in the table and just remove duplicates from these 2 columns. And in that case, an UPDATE would make more sense.

Here is a method to get this result, anyway. It assumes that (id) has a UNIQUE constraint. Tested at dbfiddle.uk:

select t.id, gt1.tx as t1, gt2.tx as t2
from t
  left join
  ( select ut.tx, min(ut.id) as id
    from
      ( select id, t1 as tx from t
        union all
        select id, t2 from t
      ) ut 
    group by ut.tx
  ) as gt1
  on t.t1 = gt1.tx and t.id = gt1.id
  left join
  ( select ut.tx, min(ut.id) as id
    from
      ( select id, t1 as tx from t
        union all
        select id, t2 from t
      ) ut 
    group by ut.tx
  ) as gt2
  on t.t2 = gt2.tx and t.id = gt2.id and gt1.tx <> gt2.tx
 ;

Notice that in MariaDB (MySQL's first cousin), that has CTEs, the same query can be rewritten more compactly and more clearly:

with gt as 
  ( select ut.tx, min(ut.id) as id
    from
      ( select id, t1 as tx from t
        union all
        select id, t2 from t
      ) ut 
    group by ut.tx
  ) 
select t.id, gt1.tx as t1, gt2.tx as t2
from t
  left join gt as gt1
    on t.t1 = gt1.tx and t.id = gt1.id
  left join gt as gt2
    on t.t2 = gt2.tx and t.id = gt2.id and gt1.tx <> gt2.tx
 ;

Logically, both variations work the same way, though. This subselect:

  ( select ut.tx, min(ut.id) as id
    from
      ( select id, t1 as tx from t
        union all
        select id, t2 from t
      ) ut 
    group by ut.tx
  )

returns all distinct t1 and t2 values along with the ID of the first¹ row where each is encountered. For your example, it produces this output:

tx   id
---  ---
a    1
b    1
c    2
e    3
f    4
g    4

The above set is joined against the original table twice, on t1 and on t2. In each case, where the original row's ID matches the subselect row's ID, the value is returned intact, because the match indicates that the row is the value's first occurrence; otherwise the value is replaced with a null.

_{¹“First" in the order of ID.}

Related Solutions

Mysql – Query to compare two subsets of data from the same table

I think you have to hack it a bit with a derived table, AKA an implicit temporary table, AKA a "subquery in the from clause."

We derive a table we'll call `t` containing each distinct (server,software) from gocore, then left join to gocore twice, once on tag = 'old' and once on tag = 'new'.

SELECT t.server, t.software, o.revision AS old_rev, n.revision AS new_rev
  FROM (SELECT DISTINCT server, software FROM gocore) t
  LEFT JOIN gocore o ON o.server = t.server AND o.software = t.software AND o.tag = 'old'
  LEFT JOIN gocore n ON n.server = t.server AND n.software = t.software AND n.tag = 'new';

MySQL loading NULLs in numeric columns

Very short answer : No new datatypes have been created to accommodate you.

While we are on this subject

Let's try plain SQL

USE test
DROP TABLE IF EXISTS numtest;
CREATE TABLE numtest
(
  id int not null auto_increment,
  xx decimal(10,3) default null,
  primary key (id)
);
INSERT INTO numtest (id) values (0),(0),(0),(0),(0);
SELECT * FROM numtest;

Does this work ???

mysql> USE test
Database changed
mysql> DROP TABLE IF EXISTS numtest;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE numtest
    -> (
    ->   id int not null auto_increment,
    ->   xx decimal(10,3) default null,
    ->   primary key (id)
    -> );
Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO numtest (id) values (0),(0),(0),(0),(0);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM numtest;
+----+------+
| id | xx   |
+----+------+
|  1 | NULL |
|  2 | NULL |
|  3 | NULL |
|  4 | NULL |
|  5 | NULL |
+----+------+
5 rows in set (0.00 sec)

mysql>

OK, fine. It works with SQL. You are asking about LOAD DATA INFILE

You brought up a post I answered : MySQL is inserting "" as 0 in decimal fields. How to stop that?

Let's see if that bug was addressed since it was submitted. I will try to duplicate the code in that bug that did not work.

First let's create that table from the bug report

mysql> USE test
Database changed
mysql> DROP TABLE IF EXISTS bug_repeat;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> CREATE TABLE bug_repeat
    -> (
    ->   name varchar(10),
    ->   price decimal(12,6)
    -> )
    -> ENGINE=MYISAM DEFAULT CHARSET=ascii COLLATE=ascii_bin;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW CREATE TABLE bug_repeat\G
*************************** 1. row ***************************
       Table: bug_repeat
Create Table: CREATE TABLE `bug_repeat` (
  `name` varchar(10) COLLATE ascii_bin DEFAULT NULL,
  `price` decimal(12,6) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=ascii COLLATE=ascii_bin
1 row in set (0.00 sec)

mysql>

Next, let's make some data

C:\>type C:\MySQLDBA\bug_test.txt
name,
name,0
,
name,6
name,2
name,
name,0
name,0
name,
name,0

C:\>

Let's run the LOAD DATA INFILE

mysql> load data local infile 'C:/MySQLDBA/bug_test.txt'
    -> into table test.bug_repeat
    -> fields terminated by ','
    -> lines terminated by '\n';
Query OK, 10 rows affected, 4 warnings (0.00 sec)
Records: 10  Deleted: 0  Skipped: 0  Warnings: 4

Ouch, what happened

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 1lue: '
*************************** 2. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 3lue: '
*************************** 3. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 6lue: '
*************************** 4. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 9lue: '
4 rows in set (0.00 sec)

mysql> select * from bug_repeat;
+------+----------+
| name | price    |
+------+----------+
| name | 0.000000 |
| name | 0.000000 |
|      | 0.000000 |
| name | 6.000000 |
| name | 2.000000 |
| name | 0.000000 |
| name | 0.000000 |
| name | 0.000000 |
| name | 0.000000 |
| name | 0.000000 |
+------+----------+
10 rows in set (0.00 sec)

mysql>

What's the sql_mode ?

mysql> select @@sql_mode;
+------------------------+
| @@sql_mode             |
+------------------------+
| NO_ENGINE_SUBSTITUTION |
+------------------------+
1 row in set (0.00 sec)

mysql>

Let's blank out the sql_mode, truncate the table and reload

mysql> set sql_mode = '';
Query OK, 0 rows affected (0.00 sec)

mysql> select @@sql_mode;
+------------+
| @@sql_mode |
+------------+
|            |
+------------+
1 row in set (0.00 sec)

mysql> truncate table bug_repeat;
Query OK, 0 rows affected (0.00 sec)

mysql> load data local infile 'C:/MySQLDBA/bug_test.txt'
    -> into table test.bug_repeat
    -> fields terminated by ','
    -> lines terminated by '\n';
Query OK, 10 rows affected, 4 warnings (0.02 sec)
Records: 10  Deleted: 0  Skipped: 0  Warnings: 4

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 1lue: '
*************************** 2. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 3lue: '
*************************** 3. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 6lue: '
*************************** 4. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 9lue: '
4 rows in set (0.00 sec)

mysql>

Let doctor the input file with \N like the bug report had

C:\>type C:\MySQLDBA\bug_test.txt
name,\N
name,0
\N,\N
name,6
name,2
name,\N
name,0
name,0
name,\N
name,0

C:\>

Let's repeat all of this with InnoDB

mysql> USE test
Database changed
mysql> DROP TABLE IF EXISTS bug_repeat;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE bug_repeat
    -> (
    ->   name varchar(10),
    ->   price decimal(12,6)
    -> )
    -> ENGINE=InnoDB;
Query OK, 0 rows affected (0.05 sec)

mysql> truncate table bug_repeat;
Query OK, 0 rows affected (0.05 sec)

mysql> load data local infile 'C:/MySQLDBA/bug_test.txt'
    -> into table test.bug_repeat
    -> fields terminated by ','
    -> lines terminated by '\n';
Query OK, 10 rows affected, 4 warnings (0.00 sec)
Records: 10  Deleted: 0  Skipped: 0  Warnings: 4

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 1lue: 'N
*************************** 2. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 3lue: 'N
*************************** 3. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 6lue: 'N
*************************** 4. row ***************************
  Level: Warning
   Code: 1366
' for column 'price' at row 9lue: 'N
4 rows in set (0.00 sec)

mysql> select * from bug_repeat;
+------+----------+
| name | price    |
+------+----------+
| name | 0.000000 |
| name | 0.000000 |
| NULL | 0.000000 |
| name | 6.000000 |
| name | 2.000000 |
| name | 0.000000 |
| name | 0.000000 |
| name | 0.000000 |
| name | 0.000000 |
| name | 0.000000 |
+------+----------+
10 rows in set (0.00 sec)

mysql>

What version of MySQL am I using ???

mysql> show global variables like 'version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| version                 | 5.6.22                       |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Win64                        |
+-------------------------+------------------------------+
4 rows in set (0.00 sec)

mysql>

What about Linux ???

$ cat /tmp/bug_test.txt
name,\N
name,0
\N,\N
name,6
name,2
name,\N
name,0
name,0
name,\N
name,0

$

Logging in to mysql and trying ...

mysql> create database test;
Query OK, 1 row affected (0.01 sec)

mysql> USE test
Database changed
mysql> DROP TABLE IF EXISTS bug_repeat;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> CREATE TABLE bug_repeat
    -> (
    ->   name varchar(10),
    ->   price decimal(12,6)
    -> )
    -> ENGINE=InnoDB;
Query OK, 0 rows affected (0.09 sec)

mysql> truncate table bug_repeat;
Query OK, 0 rows affected (0.04 sec)

mysql> load data local infile 'C:/MySQLDBA/bug_test.txt'
    -> into table test.bug_repeat
    -> fields terminated by ','
    -> lines terminated by '\n';
ERROR 2 (HY000): File 'C:/MySQLDBA/bug_test.txt' not found (Errcode: 2 - No such file or directory)
mysql> show warnings\G
Empty set (0.00 sec)

mysql> select * from bug_repeat;
Empty set (0.00 sec)

mysql> truncate table bug_repeat;
Query OK, 0 rows affected (0.04 sec)

mysql> load data local infile '/tmp/bug_test.txt'
    -> into table test.bug_repeat
    -> fields terminated by ','
    -> lines terminated by '\n';
Query OK, 10 rows affected (0.00 sec)
Records: 10  Deleted: 0  Skipped: 0  Warnings: 0

mysql> show warnings\G
Empty set (0.00 sec)

mysql> select * from bug_repeat;
+------+----------+
| name | price    |
+------+----------+
| name |     NULL |
| name | 0.000000 |
| NULL |     NULL |
| name | 6.000000 |
| name | 2.000000 |
| name |     NULL |
| name | 0.000000 |
| name | 0.000000 |
| name |     NULL |
| name | 0.000000 |
+------+----------+
10 rows in set (0.00 sec)

mysql> show global variables like 'version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| version                 | 5.6.21-log                   |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Linux                        |
+-------------------------+------------------------------+
4 rows in set (0.00 sec)

mysql>

Today's date ???

mysql> select now();
+---------------------+
| now()               |
+---------------------+
| 2015-06-25 18:48:10 |
+---------------------+
1 row in set (0.01 sec)

mysql>

It's been a year and one week since that bug report was submitted and nothing has changed.

My answer to MySQL is inserting "" as 0 in decimal fields. How to stop that? still stands as of today.

You need to do this test against MySQL 5.6.23 and see if something has changed.

Best Answer

Related Solutions

Mysql – Query to compare two subsets of data from the same table

MySQL loading NULLs in numeric columns

Related Question