Select most occurrences of a varchar2 in Oracle

group byoracle

We need to group data in an Oracle database and must get a ProductNumber and a Description. The Description needs to be the one with the most occurrences inside the table. Both are a varchar2. See below for the raw data:

ProductNumber Description
ABC Product ABC (with discount)
ABC Product ABC
ABC Product ABC
DEF Product DEF

Expected result:

ProductNumber Description
ABC Product ABC
DEF Product DEF

We've tried this but don't find a way to include the Description with most occurences.

select distinct cnt1.ProductNumber
from (select COUNT(*) as total, ProductNumber
      from Inventory
      group by ProductNumber) cnt1,
     (select MAX(total) as maxtotal, ProductNumber
      from (select COUNT(*) as total, ProductNumber 
            from Inventory 
            group by ProductNumber)
      group by ProductNumber) cnt2
where cnt1.total = cnt2.maxtotal;

Here you could find a fiddle.

Do you know how we could do it?

Best Answer

Here is one way to do it using window functions:

select ProductNumber, Description
from (
  select ProductNumber, Description
       , row_number() over (partition by productnumber order by cnt1, cnt2 desc) as rn
  from (
    select ProductNumber, Description
         , count(1) over (partition by ProductNumber) as cnt1
         , count(1) over (partition by ProductNumber, Description) as cnt2
    from inventory
  ) t
) u 
where rn = 1;

EDIT: Added plans for a slightly bigger set of data, Fiddle

Verace query

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |      1 |        |    676 |00:00:00.01 |     122 |       |       |          |
|   1 |  SORT UNIQUE               |           |      1 |  17576 |    676 |00:00:00.01 |     122 | 46080 | 46080 |40960  (0)|
|*  2 |   HASH JOIN SEMI           |           |      1 |  17576 |  17576 |00:00:00.01 |     122 |  2178K|  2050K| 2666K (0)|
|   3 |    TABLE ACCESS FULL       | INVENTORY |      1 |  17576 |  17576 |00:00:00.01 |      61 |       |       |          |
|*  4 |    VIEW                    |           |      1 |  17576 |    676 |00:00:00.01 |      61 |       |       |          |
|*  5 |     WINDOW SORT PUSHED RANK|           |      1 |  17576 |    676 |00:00:00.01 |      61 | 48128 | 48128 |43008  (0)|
|   6 |      HASH GROUP BY         |           |      1 |  17576 |    676 |00:00:00.01 |      61 |  1214K|  1214K| 1854K (0)|
|   7 |       TABLE ACCESS FULL    | INVENTORY |      1 |  17576 |  17576 |00:00:00.01 |      61 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("T1"."DESCRIPTION"="I2"."DESCRIPTION")
   4 - filter("T1"."RN"=1)
   5 - filter(ROW_NUMBER() OVER ( PARTITION BY "I1"."PRODUCTNUMBER" ORDER BY COUNT(*) DESC )<=1)

H. Pauwelyn's modified query

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |      1 |        |    676 |00:00:00.01 |     122 |       |       |          |
|   1 |  HASH UNIQUE          |           |      1 |    176 |    676 |00:00:00.01 |     122 |  2170K|  2170K| 1427K (0)|
|*  2 |   HASH JOIN RIGHT SEMI|           |      1 |    176 |    676 |00:00:00.01 |     122 |  2546K|  2546K|  654K (0)|
|   3 |    VIEW               |           |      1 |  17576 |    676 |00:00:00.01 |      61 |       |       |          |
|   4 |     HASH GROUP BY     |           |      1 |  17576 |    676 |00:00:00.01 |      61 |  1520K|  1520K| 1882K (0)|
|   5 |      TABLE ACCESS FULL| INVENTORY |      1 |  17576 |  17576 |00:00:00.01 |      61 |       |       |          |
|   6 |    VIEW               |           |      1 |  17576 |    676 |00:00:00.01 |      61 |       |       |          |
|   7 |     HASH GROUP BY     |           |      1 |  17576 |    676 |00:00:00.01 |      61 |  1520K|  1520K| 1882K (0)|
|   8 |      TABLE ACCESS FULL| INVENTORY |      1 |  17576 |  17576 |00:00:00.01 |      61 |       |       |          |
------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("CNT1"."TOTAL"="CNT2"."MAXTOTAL")

Lennart's query

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |      1 |        |    676 |00:00:00.03 |     208 |       |       |          |
|*  1 |  VIEW                    |      |      1 |  17576 |    676 |00:00:00.03 |     208 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|      |      1 |  17576 |    676 |00:00:00.03 |     208 |  1116K|   557K|  991K (0)|
|   3 |    VIEW                  |      |      1 |  17576 |  17576 |00:00:00.02 |     208 |       |       |          |
|   4 |     WINDOW BUFFER        |      |      1 |  17576 |  17576 |00:00:00.02 |     208 |  1116K|   557K|  991K (0)|
|   5 |      INDEX FULL SCAN     | IX1  |      1 |  17576 |  17576 |00:00:00.01 |     208 |       |       |          |
----------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("RN"=1)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "PRODUCTNUMBER" ORDER BY "CNT1",INTERNAL_FUNCTION("CNT2") DESC 
              )<=1)