Mysql – Delete the N oldest entries grouped by multiple columns

deletegroup byMySQL

I´d like to delete the oldest entries of my database, but in each category shall remain at least N words of each char.

Structure of TABLE words

id | word | char | category

For example I have the following entries:

id | word | char | category
1     abc     a       1
2     abc     a       1
3     abc     a       1
4     bcs     b       1
5     bcs     b       1
6     bcs     b       1
7     csd     c       2
8     csd     c       2
9     asd     a       2
10    asc     a       2

For N=2 the following entries should be removed as their id´s are the lowest:

1     abc     a       1
4     bcs     b       1

Can somebody help? I tried various things (GROUP BY, HAVING, INNER AND OUTER JOINS) but it does not work.

Best Answer

If I understand you requirements clearly, you need to delete rows that are grouped on the basis of char and category (leaving minimum latest N rows)

Updated: Temporary table workaround

This can be achieved by

Option 1:

create temporary table test_table_rows_temp SELECT * from test;

DELETE t1
FROM 
    test  as t1
WHERE
    t1.id < ( 
            SELECT 
                min( t2.id ) 
            FROM 
                test_table_rows_temp as t2 
            WHERE t2.category =  t1.category 
                    AND t2.`char` =  t1.`char` 
                    AND t2.id >  t1.ID 
            HAVING count( t2.id ) >= 2 
        ) 

I have created a temporary table to store the results of the test table, this helps to ease out the query

Option 2:

There's a workaround to avoid manually creating a temporary table by using a nested sub-query instead

DELETE t1
FROM 
    test  as t1
WHERE
    t1.id < ( 
            SELECT 
                min( id ) 
            FROM (
                SELECT 
                    id, 
                    category, 
                    `char`
                FROM test
            ) AS t2 
            WHERE t2.category =  t1.category 
                    AND t2.`char` =  t1.`char` 
                    AND t2.id >  t1.ID 
            HAVING count( t2.id ) >= 2 
        );

The nested sub-query internally too creates a temporary table :)

The 2 rows as mentioned above (ids 1, 4) are deleted

The query simply gets minimum - N + 1 id in the group of category, word (ordered by the latest id) and then checks if the current row id is less than the one retrieved (within the same group(s))

SQLFiddle

Hope this helps