SQL Server and PostgreSQL – Creating a More Intelligent NTILE

postgresqlranksql serverwindow functions

When using the ntile() window function, the major issue is that it arbitrarily groups into roughly equal parts regardless of the actual value.

For example with the following query:

select
    id,title,price,
    row_number() over(order by price) as row_number,
    rank() over(order by price) as rank,
    count(*) over(order by price) as count,
    dense_rank() over(order by price) as dense_rank,
    ntile(10) over(order by price) as decile
from paintings
order by price;

I will get 10 groups of about the same size, with the strong likelihood that paintings with the same price will end up in different bins.

For Example:

┌────┬────────────────────────────────────────────┬───────┬────────────┬──────┬───────┬────────────┬────────┐
│ id │ title                                      │ price │ row_number │ rank │ count │ dense_rank │ decile │
╞════╪════════════════════════════════════════════╪═══════╪════════════╪══════╪═══════╪════════════╪════════╡
│ 11 │ Eyes in the Heat                           │ 10    │ 1          │ 1    │ 1     │ 1          │ 1      │
│ 19 │ Deux fillettes, fond jaune et rouge        │ 11    │ 2          │ 2    │ 2     │ 2          │ 1      │
│ 17 │ Flowers in a Pitcher                       │ 12    │ 3          │ 3    │ 6     │ 3          │ 1      │
│ 5  │ Composition with Red, Yellow and Blue      │ 12    │ 4          │ 3    │ 6     │ 3          │ 2      │
│ 18 │ La lecon de musique (The Music Lesson)     │ 12    │ 5          │ 3    │ 6     │ 3          │ 2      │
│ 9  │ The Adoration of the Magi                  │ 12    │ 6          │ 3    │ 6     │ 3          │ 2      │
│ 29 │ Self-Portrait                              │ 14    │ 7          │ 7    │ 10    │ 4          │ 3      │
│ 25 │ Symphony in White, No. 1: The White Girl   │ 14    │ 8          │ 7    │ 10    │ 4          │ 3      │
│ 30 │ The Anatomy Lecture of Dr. Nicolaes Tulp   │ 14    │ 9          │ 7    │ 10    │ 4          │ 3      │
│ 20 │ Les repasseuses (Women Ironing)            │ 14    │ 10         │ 7    │ 10    │ 4          │ 4      │
│ 1  │ The Birth of Venus                         │ 15    │ 11         │ 11   │ 14    │ 5          │ 4      │
│ 12 │ Femme se promenant dans une foret exotique │ 15    │ 12         │ 11   │ 14    │ 5          │ 4      │
│ 24 │ Portrait of the Painter’s Mother           │ 15    │ 13         │ 11   │ 14    │ 5          │ 5      │
│ 28 │ Jeunes filles au piano                     │ 15    │ 14         │ 11   │ 14    │ 5          │ 5      │
│ 7  │ Portrait de l artiste (Self-portrait)      │ 16    │ 15         │ 15   │ 17    │ 6          │ 5      │
│ 3  │ The Last Supper                            │ 16    │ 16         │ 15   │ 17    │ 6          │ 6      │
│ 13 │ Combat of a Tiger and a Buffalo            │ 16    │ 17         │ 15   │ 17    │ 6          │ 6      │
│ 4  │ The Creation of Man                        │ 17    │ 18         │ 18   │ 19    │ 7          │ 6      │
│ 22 │ Le Chemin de Fer                           │ 17    │ 19         │ 18   │ 19    │ 7          │ 7      │
│ 6  │ Femmes de Tahiti [Sur la plage]            │ 18    │ 20         │ 20   │ 24    │ 8          │ 7      │
│ 21 │ Le Bar aux Folies-Berg                     │ 18    │ 21         │ 20   │ 24    │ 8          │ 7      │
│ 26 │ Lady at the Piano                          │ 18    │ 22         │ 20   │ 24    │ 8          │ 8      │
│ 15 │ Remembrance of a Garden                    │ 18    │ 23         │ 20   │ 24    │ 8          │ 8      │
│ 16 │ 1914                                       │ 18    │ 24         │ 20   │ 24    │ 8          │ 8      │
│ 14 │ Ancient Sound, Abstract on Black           │ 19    │ 25         │ 25   │ 28    │ 9          │ 9      │
│ 8  │ The Large Turf                             │ 19    │ 26         │ 25   │ 28    │ 9          │ 9      │
│ 23 │ On the Beach                               │ 19    │ 27         │ 25   │ 28    │ 9          │ 9      │
│ 2  │ Portrait of Mona Lisa                      │ 19    │ 28         │ 25   │ 28    │ 9          │ 10     │
│ 27 │ On the Terrace                             │ 20    │ 29         │ 29   │ 30    │ 10         │ 10     │
│ 10 │ The She-Wolf                               │ 20    │ 30         │ 29   │ 30    │ 10         │ 10     │
└────┴────────────────────────────────────────────┴───────┴────────────┴──────┴───────┴────────────┴────────┘

Note that there are four items with price 12, but two of them are in the have decile 1, and two of them decile 2. I would like to keep these items together, and I’m not fussed about which decile.

I have included other window functions to make the comparison.

It seems that ntile() uses the row_number() only and bases the cuttoffs on that. It would be more fair if it used either the rank() or count(*) function as items with the same price would end up in the same bin.

This is the behaviour for both PostgreSQL and SQL Server, and, presumably, for the rest.

The question is, is there a way to achieve this?

Best Answer

You could use rank() and do integer division with number of rows for each bin.

declare @T table(id int, title varchar(100), price int);

insert into @T(id, title, price) values
(19, 'Deux fillettes, fond jaune et rouge        ', 11),
(17, 'Flowers in a Pitcher                       ', 12),
(5 , 'Composition with Red, Yellow and Blue      ', 12),
(18, 'La lecon de musique (The Music Lesson)     ', 12),
(9 , 'The Adoration of the Magi                  ', 12),
(29, 'Self-Portrait                              ', 14),
(25, 'Symphony in White, No. 1: The White Girl   ', 14),
(30, 'The Anatomy Lecture of Dr. Nicolaes Tulp   ', 14),
(20, 'Les repasseuses (Women Ironing)            ', 14),
(1 , 'The Birth of Venus                         ', 15),
(12, 'Femme se promenant dans une foret exotique ', 15),
(24, 'Portrait of the Painter’s Mother           ', 15),
(28, 'Jeunes filles au piano                     ', 15),
(7 , 'Portrait de l artiste (Self-portrait)      ', 16),
(3 , 'The Last Supper                            ', 16),
(13, 'Combat of a Tiger and a Buffalo            ', 16),
(4 , 'The Creation of Man                        ', 17),
(22, 'Le Chemin de Fer                           ', 17),
(6 , 'Femmes de Tahiti [Sur la plage]            ', 18),
(21, 'Le Bar aux Folies-Berg                     ', 18),
(26, 'Lady at the Piano                          ', 18),
(15, 'Remembrance of a Garden                    ', 18),
(16, '1914                                       ', 18),
(14, 'Ancient Sound, Abstract on Black           ', 19),
(8 , 'The Large Turf                             ', 19),
(23, 'On the Beach                               ', 19),
(2 , 'Portrait of Mona Lisa                      ', 19),
(27, 'On the Terrace                             ', 20),
(10, 'The She-Wolf                               ', 20);

declare @BinCount int = 10;
declare @BinSize int;
select @BinSize = 1 + count(*) / @BinCount from @T;

select T.id,
       T.title,
       T.price,
       1 + rank() over(order by T.price) / @BinSize as decile
from @T as T;

Result:

id  title                                       price  decile
--- ------------------------------------------- ------ --------------------
19  Deux fillettes, fond jaune et rouge         11     1
17  Flowers in a Pitcher                        12     1
5   Composition with Red, Yellow and Blue       12     1
18  La lecon de musique (The Music Lesson)      12     1
9   The Adoration of the Magi                   12     1
29  Self-Portrait                               14     3
25  Symphony in White, No. 1: The White Girl    14     3
30  The Anatomy Lecture of Dr. Nicolaes Tulp    14     3
20  Les repasseuses (Women Ironing)             14     3
1   The Birth of Venus                          15     4
12  Femme se promenant dans une foret exotique  15     4
24  Portrait of the Painter’s Mother            15     4
28  Jeunes filles au piano                      15     4
7   Portrait de l artiste (Self-portrait)       16     5
3   The Last Supper                             16     5
13  Combat of a Tiger and a Buffalo             16     5
4   The Creation of Man                         17     6
22  Le Chemin de Fer                            17     6
6   Femmes de Tahiti [Sur la plage]             18     7
21  Le Bar aux Folies-Berg                      18     7
26  Lady at the Piano                           18     7
15  Remembrance of a Garden                     18     7
16  1914                                        18     7
14  Ancient Sound, Abstract on Black            19     9
8   The Large Turf                              19     9
23  On the Beach                                19     9
2   Portrait of Mona Lisa                       19     9
27  On the Terrace                              20     10
10  The She-Wolf                                20     10

and I’m not fussed about which decile

Note that bin 2 and 8 with your sample data ended up empty.