Is COUNT(DISTINCT PRIMARY_KEY) faster than COUNT(*)

countoracle

I haven't got a particular problem here, I'm just interested. One of my associates uses COUNT(DISTINCT(PRIMARY_KEY)) every time he needs to select all the data from a table (with 100m + rows) is it any quicker to do it this way rather than just count(*)? If it's the primary key surely the database knows that it's just pulling a count for the whole database and why?

Best Answer

It is slower.

SQL> select count(*) from t1;

  COUNT(*)
----------
   4023040

SQL> alter table t1 add primary key (object_id);

Table altered.

SQL> select count(*) from t1;

  COUNT(*)
----------
   4023040

-----------------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |      1 |        |      1 |00:00:00.54 |    8852 |
|   1 |  SORT AGGREGATE       |             |      1 |      1 |      1 |00:00:00.54 |    8852 |
|   2 |   INDEX FAST FULL SCAN| SYS_C004079 |      1 |   3766K|   4023K|00:00:00.35 |    8852 |
-----------------------------------------------------------------------------------------------

Plain count(*) finished in 0.54 seconds.

SQL> select count(distinct object_id) from t1;

COUNT(DISTINCT(OBJECT_ID))
--------------------------
                   4023040

--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
--------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |             |      1 |        |      1 |00:00:05.22 |    8852 |   6944 |   6944 |       |       |          |         |
|   1 |  SORT AGGREGATE         |             |      1 |      1 |      1 |00:00:05.22 |    8852 |   6944 |   6944 |       |       |          |         |
|   2 |   VIEW                  | VW_DAG_0    |      1 |   3766K|   4023K|00:00:04.96 |    8852 |   6944 |   6944 |       |       |          |         |
|   3 |    HASH GROUP BY        |             |      1 |   3766K|   4023K|00:00:04.47 |    8852 |   6944 |   6944 |   178M|    21M|   32M (1)|   57344 |
|   4 |     INDEX FAST FULL SCAN| SYS_C004079 |      1 |   3766K|   4023K|00:00:00.41 |    8852 |      0 |      0 |       |       |          |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------

5.22 seconds, and it used 32 MB memory, and even had to use temp, because it has to GROUP BY because of DISTINCT.

Related Solutions

Oracle UPDATE followed by SELECT in LinqPad

Thank you all for your comments. You deserve the credit for this answer. To summarize:

The Oracle driver for LinqPad that I'm using (dotConnect Direct Mode based on OCI 8) wants the statement to be formatted like this (yes, wihout any semicolons):

UPDATE schedule_amounts 
SET ANNUAL_RATE = 44000
WHERE SCHEDULE_ID = 'LCSD'
    and SCHEDULE_NO = 2014
    and SCHEDULE_LEVEL = 100
    and SCHEDULE_STEP = 17
GO
SELECT * FROM schedule_amounts

Should I group by and then join, or join and then group by

The optimizer always tries to reduce the amount of data as quickly as possible. If not, your statistics might not be good.

Your plan 1 shows that less rows are processed this is good. The optimizer was able to reduce the amount of data more quickly. The numbers might not be exactly true but it gives you an idea based on the optimizer statistics.

Plan 1:

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     1 |    79 |     7  (15)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY                         |      |     1 |    79 |     7  (15)| 00:00:01 |       |       |
|   2 |   NESTED LOOPS                         |      |       |       |            |          |       |       |
|   3 |    NESTED LOOPS                        |      |     1 |    79 |     6   (0)| 00:00:01 |       |       |
|   4 |     NESTED LOOPS                       |      |     1 |    63 |     5   (0)| 00:00:01 |       |       |
|   5 |      TABLE ACCESS BY GLOBAL INDEX ROWID| t1   |     1 |    22 |     3   (0)| 00:00:01 | ROWID | ROWID |
|*  6 |       INDEX RANGE SCAN                 | t1pk |     1 |       |     2   (0)| 00:00:01 |       |       |
|   7 |      PARTITION RANGE ITERATOR          |      |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|*  8 |       TABLE ACCESS BY LOCAL INDEX ROWID| t3   |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|*  9 |        INDEX RANGE SCAN                | t3fk |     1 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
|* 10 |     INDEX UNIQUE SCAN                  | t2pk |     1 |       |     0   (0)| 00:00:01 |       |       |
|* 11 |    TABLE ACCESS BY GLOBAL INDEX ROWID  | t2   |     1 |    16 |     1   (0)| 00:00:01 | ROWID | ROWID |
---------------------------------------------------------------------------------------------------------------

It shows that Oracle did a HASH GROUP BY while plan 2 did a SORT GROUP BY.

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |      |     1 |   311 |     7  (15)| 00:00:01 |       |       |
|   1 |  NESTED LOOPS                            |      |       |       |            |          |       |       |
|   2 |   NESTED LOOPS                           |      |     1 |   311 |     7  (15)| 00:00:01 |       |       |
|   3 |    NESTED LOOPS                          |      |     1 |   253 |     6  (17)| 00:00:01 |       |       |
|   4 |     TABLE ACCESS BY GLOBAL INDEX ROWID   | t1   |     1 |   164 |     3   (0)| 00:00:01 | ROWID | ROWID |
|*  5 |      INDEX RANGE SCAN                    | t1pk |     1 |       |     2   (0)| 00:00:01 |       |       |
|   6 |     VIEW PUSHED PREDICATE                |      |     1 |    89 |     3  (34)| 00:00:01 |       |       |
|   7 |      SORT GROUP BY                       |      |     1 |    41 |     3  (34)| 00:00:01 |       |       |
|*  8 |       FILTER                             |      |       |       |            |          |       |       |
|   9 |        PARTITION RANGE SINGLE            |      |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|* 10 |         TABLE ACCESS BY LOCAL INDEX ROWID| t3   |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|* 11 |          INDEX RANGE SCAN                | t3fk |     1 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
|* 12 |    INDEX UNIQUE SCAN                     | t2pk |     1 |       |     0   (0)| 00:00:01 |       |       |
|* 13 |   TABLE ACCESS BY GLOBAL INDEX ROWID     | t2   |     1 |    58 |     1   (0)| 00:00:01 | ROWID | ROWID |
-----------------------------------------------------------------------------------------------------------------

Conclusion: In most cases I would prefer a HASH GROUP BY. So in plan 1 Oracle does a better the GROUP BY much later with much less data. 2 reasons why plan 1 is better.

I assume that this is not the only SQL statement which will be using this join constellation. You will have to decide for each SQL which way is the best.

If you have a SQL similar to the following, the with clause can be much better:

with t3g as ( -- as in 't3 grouped'
    select id2, count(1) t3count
    from t3
    where id = :1 -- new line !!!!!!!!
    group by id2
)
select t1.id t1id, t2id, t3count
from t1 join t2  on (t1.p = t2.p  and t1.id = t2.id1)
        join t3g on (t2.p = t3g.p and t2.id = id2)

Best Answer

Related Solutions

Oracle UPDATE followed by SELECT in LinqPad

Should I group by and then join, or join and then group by

Related Question