Oracle distinct doing sort

oracleperformance

I'm getting poor performance from DISTINCT. The explain plan indicates that it is doing SORT (GROUP BY) which doesn't sound right. I would expect some kind of HASH aggregation to produce much better result. Is there a hint to tell oracle to use HASH for DISTINCT rather than sort?
I've used /*+ USE_HASH_AGGREGATION */ in similar situations, but it is not working for DISTINCT.

So this is my original query:

SELECT
count(distinct userid) n, col
FROM users
GROUP BY col;

users has 30M rows, each userid is there 12 times. This query takes 70 seconds.

Now we rewrite it as

SELECT
count(userid) n, col
FROM
(SELECT distinct userid, col FROM users)
GROUP BY col

And it takes 40 seconds.
Now add the hint to do hash instead of sort:

SELECT
count(userid) n, col
FROM
(SELECT /*+ USE_HASH_AGGREGATION */ distinct userid, col FROM users)
GROUP BY col

and it takes 10 seconds.

If somebody can explain to me why this is happening or how I can beat the first simple query into working as good as the 3rd one, that would be fantastic.
The reason I care about query simplicity is because these queries are actually generated.

Plans:
1) Slow:

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation      | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem | Used-Tmp|
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |               |      1 |        |      5 |00:01:12.01 |     283K|    292K|       |       |      |     |
|   1 |  SORT GROUP BY     |               |      1 |      5 |      5 |00:01:12.01 |     283K|    292K|   194M|   448K|  172M (0)|   73728 |
|   2 |   TABLE ACCESS FULL| USERS |      1 |     29M|     29M|00:00:08.17 |     283K|    283K|       |       |      |     |

2) Fast

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation        | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |               |      1 |        |      5 |00:00:13.09 |     283K|    283K|   |   |      |
|   1 |  SORT GROUP BY       |               |      1 |      5 |      5 |00:00:13.09 |     283K|    283K|  3072 |  3072 | 2048  (0)|
|   2 |   VIEW               |               |      1 |   8647K|   2445K|00:00:13.16 |     283K|    283K|   |   |      |
|   3 |    HASH UNIQUE       |               |      1 |   8647K|   2445K|00:00:12.57 |     283K|    283K|   113M|    10M|  216M (0)|
|   4 |     TABLE ACCESS FULL| USERS         |      1 |     29M|     29M|00:00:07.68 |     283K|    283K|   |   |      |
--------------------------------------------------------------------------------------------------------------------------------------------

Best Answer

Can you add indexes? I would first try adding an index on (col, userid).

10 seconds looks too much for a 30M rows table.