MySQL aggregate anomaly

aggregateMySQL

I was checking selectivity of some columns for an index.
Where is this "ignore what I give you" behaviour documented?

This gives 4,851,908, 4,841,060, and 1,000,052

SELECT
     COUNT(*), 
     COUNT(DISTINCT Col1), COUNT(DISTINCT Col2)
FROM Sometable;

This gives 4,843,634 unique pairs as per MySQL extension

SELECT COUNT(DISTINCT Col1, Col2) FROM Sometable

The following are wrong: the individual COUNT(DISTINCT colx) all give the 4,843,634 unique pair count regardless of any filler column or expression order.

I expected COUNT(DISTINCT Col1) = 4,841,060, and COUNT(DISTINCT Col1) = 1,000,052.

SELECT COUNT(DISTINCT Col1), COUNT(DISTINCT Col2) FROM Sometable

SELECT COUNT(DISTINCT Col2), COUNT(DISTINCT Col1) FROM Sometable

SELECT COUNT(DISTINCT Col1), 1 AS Filler, COUNT(DISTINCT Col2) FROM Sometable

But this give correct values again with another aggregate (like with COUNT(*) above)

SELECT COUNT(DISTINCT Col1), MAX(col1) AS Filler, COUNT(DISTINCT Col2) FROM Sometable

Questions, in case it wasn't clear:

Why does COUNT(DISTINCT Col1), COUNT(DISTINCT Col2) behave like COUNT(DISTINCT Col1, Col2)
Why is another aggregate required to make it work?

Best Answer

It looks like you are hitting this regression bug:

select count(distinct N1), count(distinct N2) from test.AA" works incorrectly
...
"This bug happens when a unique index exists"

One of the suggested workarounds is to use sql_buffer_result

Related Solutions

MySQL varchar(255) limitation & Anomaly

MySQL is not cutting it at 233. The problem is likely in your save method which cuts it to 233 before the data even reaches MySQL.

Also, don't forget that 233 limit is not character limit, and as some character might need more han 1 byte to be stored, you might see less than 233 characters.

Also please make sure that data in MySQL server is really stored as latin1, this can be accomplished with:

show variables like 'char%';

Mysql – Do I need a locking read

I have a suggestion that could simply your code and your SQL

You are doing round trips at this point by checking each individual someID, coming back to the client, seeing if the check failed. If the check failed, it is OK to perform the INSERT. That's another round trip. You will require some kind of locking for reads.

If you have to INSERT 10000 rows:

In the best case, you run 10000 SELECTs are find that someID exists
In the worst case, you run 10000 SELECTs, each someID does not exist, and do 10000 INSERTs.

Why not just perform all this with a single SQL statement ?

function doStuff(int someID) {
    getvalue = `SELECT value FROM statetable`
    INSERT IGNORE INTO targettable VALUES (someID,getvalue);
}

This works only if someID is unique in targettable

In your code, try submitting a list of someID values, iterate that list, and submit INSERT IGNORE commands for all someID values. That you can do in a START TRANSACTION/COMMIT block.

function doStuff(int someIDList) {
    int idcount = 0;
    getvalue = `SELECT value FROM statetable`
    for someID in someIDList
    do
        if idcount = 0 then
            START TRANSACTION
        endif;
        idcount++;
        INSERT IGNORE INTO targettable VALUES (someID,getvalue);
    end while;
    if idcount > 0 then
        COMMIT
    endif;
}

Related Question