MariaDB – Deleting Rows Where Values Are Less Than Aggregate Value

clustered-primary-keymariadb

I'm attempting to remove duplicates from a table.

CREATE TABLE media_views (
  `companyid` INT (11) NOT NULL,
  `channel` INT(11) NOT NULL,
  `date` DATE NOT NULL,
  `views` INT(11) NOT NULL DEFAULT 0
  PRIMARY KEY (`companyid`,`channel`,`date`,`views`)
) ENGINE=InnoDB;

INSERT INTO media_views (companyid, channel, `date`, views)
     VALUES (1, 0,  '2015-10-01', 15),
            (1, 0,  '2015-10-01', 25),
            (1, 0,  '2015-10-01', 541),
            (1, 5,  '2015-10-01', 2),
            (1, 8,  '2015-10-01', 53),
            (1, 91, '2015-10-01', 8571),
            (3, 12, '2015-10-01', 11),
            (1, 0,  '2015-10-02', 1),
            (1, 0,  '2015-10-02', 13),
            (1, 5,  '2015-10-02', 17),
            (1, 91, '2015-10-03', 37),
            (3, 0,  '2015-10-03', 73);

Now the problem is that I have "duplicate" errors, I have records that match the dates but views are different. Only the maximum values of the views are actually useful. So I can solve this problem by doing the following.

SELECT companyid, channel, date, MAX(views) 
FROM media_views 
GROUP BY companyid, channel, date 
HAVING COUNT(1) >= 1;

This clearly provides me with the following:

+-----------+---------+------------+------------+
| companyid | channel | date       | MAX(views) |
+-----------+---------+------------+------------+
| 1         | 0       | 2015-10-01 | 541        |
| 1         | 0       | 2015-10-02 | 13         |
| 1         | 5       | 2015-10-01 | 2          |
| 1         | 5       | 2015-10-02 | 17         |
| 1         | 8       | 2015-10-01 | 53         |
| 1         | 91      | 2015-10-01 | 8571       |
| 1         | 91      | 2015-10-03 | 37         |
| 3         | 0       | 2015-10-03 | 73         |
| 3         | 12      | 2015-10-01 | 11         |
+-----------+---------+------------+------------+
9 rows in set (0.11 sec)

Which is great but how do I go back and delete the rows that are duplicates and less than the max(views) value?

Best Answer

You can select the rows to keep using a left join - http://sqlfiddle.com/#!9/3a677/3

select *
from media_views m1
left join media_views m2
  on (m1.companyid = m2.companyid
      and m1.channel = m2.channel
      and m1.date = m2.date
      and m1.views < m2.views)
where m2.companyid is null;

And you can delete the other ones using similar a inner join - http://sqlfiddle.com/#!9/d5d36/2

delete m1 from media_views m1
join media_views m2
  on (m1.companyid = m2.companyid
      and m1.channel = m2.channel
      and m1.date = m2.date
      and m1.views < m2.views);

And do not forget to modify the primary key after deleting them so it is not possible anymore.