MySQL – Get the nth Distinct Column Value Ordered by Another Column

distinctgroup byMySQLorder-by

I want to get the row with the nth distinct value in a column, as ordered by another column.

In more real terms, I want the 1000th distinct email_address as ordered by created_at, and the email_address column does not have a unique constraint.

I believe I have done this correctly with the following query:

SELECT `email_address`
    FROM footable
    GROUP BY `email_address`
    ORDER BY `created_at` ASC
    LIMIT 1
    OFFSET 1000

But I am not totally confident that is giving me the right value.

Is the above query correct for this purpose? If not, how should I solve this?

Best Answer

You need two changes:

  • a major one: Using an non-aggregated column in a GROUP BY query will yield unpredictable result - it's a pity that default setting s in MySQL allow this type of query, and good that it is corrected in 5.7 version.
    The problem with your query is that an email address can appear in many rows - with many different created_at values - and the question is: Which of these different values will MySQL use for the ORDER BY? The answer is any one of the them. It's usually the first it will find but "first" here can mean different things, depending on the execution plans and available indexes. So, the results will not be determinate and may change from one execution to another.
    You can use MIN(created_at) instead for the ORDER BY and the issue is solved.

  • a minor one: OFFSET x means "skip x rows". Replace 1000 with OFFSET 999:

    SELECT email_address
        FROM footable
        GROUP BY email_address
        ORDER BY MIN(created_at) ASC
        LIMIT 1
        OFFSET 999 ;