I feel like this is a simple thing to do, but I've been playing around with trying to achieve the desired result and the only way I'm coming up is with 2 nested subqueries, which doesn't seem great from a performance perspective.
I have a table called billing
which is currently 60,000 rows and growing at an increasing rate, with maybe 100 or 200 new entries per day.
Here is a query I wrote that returns exactly what I want:
SELECT *
FROM
(
SELECT COUNT(*) AS calls, tmp.*
FROM
(
SELECT b.id AS bid, u.id AS uid, b.rt_time,
b.clli, u.city, u.state
FROM nrt.billing AS b
LEFT JOIN users AS u ON b.switchowner = u.id
ORDER BY b.id DESC
LIMIT 18446744073709551615
) AS tmp
GROUP BY clli
ORDER BY bid DESC
LIMIT 18446744073709551615
) AS tmp2
GROUP BY uid
ORDER BY bid DESC
(The LIMIT weirdness is a MariaDB hack to make it behave like MySQL.)
I initially had just:
SELECT COUNT(*) AS calls, tmp.*
FROM
(
SELECT b.id AS bid, u.id AS uid, b.rt_time, b.clli, u.city, u.state
FROM nrt.billing AS b
LEFT JOIN users AS u ON b.switchowner = u.id
ORDER BY b.id DESC
LIMIT 18446744073709551615
) AS tmp
GROUP BY clli
ORDER BY bid DESC`
… and that worked mostly all right. What I am trying to do is get one record per clli
column, the newest once for each (hence with the highest ID). This gives me the newest record for every clli
value there is. I do like the single subquery way of doing it as opposed to inner joining on a table with max values.
A slight problem is that the clli
values sometimes (but rarely) change. In that case, I don't want the clli
values that have been superseded. There is "effectively" a foreign key to the users
table here, so I can associate clli
values with user IDs. Of course, GROUP BY clli, id
will give me every unique clli
/id
pairing, giving me more rows, the opposite of what I want. So, I nested the subqueries and did another GROUP BY
to get only the latest clli
per user ID.
Is there a way to optimize out the second nested subquery surrounding the first one? It seems like there should also be a way to combine the GROUP BY
s in the other direction.
At 60,000 rows, and the billing
table being much larger than users
, this doesn't perform badly, but it just seems like a bad query to me. Can the second subquery be optimized out, or is this as good as I'll get with this approach?
Here is table structure:
describe billing;
+---------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ani | varchar(32) | NO | MUL | NULL | |
| called_number | int(7) | NO | | NULL | |
| chargedfrom | varchar(32) | NO | | NULL | |
| switchowner | int(11) unsigned | NO | | NULL | |
| clli | varchar(24) | NO | | NULL | |
| rt_time | datetime | NO | | CURRENT_TIMESTAMP | |
+---------------+------------------+------+-----+-------------------+----------------+
describe users;
+-----------------+-------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment | |
| city | varchar(32) | NO | | NULL | |
| state | varchar(2) | NO | | NULL |
+-----------------+-------------+------+-----+-------------------+----------------+
I've omitted rows that are not relevant / used in these queries.
It isn't officially part of the structure, but billing.switchowner is used as a foreign key with users.id.
Here is a DB fiddle: https://www.db-fiddle.com/f/7y19T29CeykBSPFb7oJq9M/0
If that doesn't work, this will set up the fiddle:
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`city` varchar(32) NOT NULL,
`state` varchar(2) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;
CREATE TABLE `billing` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`switchowner` int(11) unsigned NOT NULL,
`clli` varchar(24) NOT NULL,
`rt_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;
INSERT INTO `users` (city, state) VALUES ("New York", "NY");
INSERT INTO `users` (city, state) VALUES ("Wilmington", "DE");
INSERT INTO `users` (city, state) VALUES ("San Francisco", "CA");
INSERT INTO `users` (city, state) VALUES ("Phoenix", "AZ");
INSERT INTO `users` (city, state) VALUES ("Houston", "TX");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK-01");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK-01");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK01");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK01");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK01");
INSERT INTO `billing` (switchowner, clli) VALUES ("4", "PHNXAZ01");
INSERT INTO `billing` (switchowner, clli) VALUES ("4", "PHNXAZ01");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK01");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK01");
INSERT INTO `billing` (switchowner, clli) VALUES ("5", "HTX01");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK01");
INSERT INTO `billing` (switchowner, clli) VALUES ("2", "WILMDE01");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK01");
INSERT INTO `billing` (switchowner, clli) VALUES ("3", "SANFRAN01");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK01");
INSERT INTO `billing` (switchowner, clli) VALUES ("2", "WILMDE01");
INSERT INTO `billing` (switchowner, clli) VALUES ("5", "HTX01");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK01");
INSERT INTO `billing` (switchowner, clli) VALUES ("3", "SANFRAN01");
INSERT INTO `billing` (switchowner, clli) VALUES ("4", "PHNXAZ01");
INSERT INTO `billing` (switchowner, clli) VALUES ("3", "SANFRAN01");
INSERT INTO `billing` (switchowner, clli) VALUES ("1", "NEWYRK01");
INSERT INTO `billing` (switchowner, clli) VALUES ("3", "SANFRAN01");
SELECT * FROM (SELECT COUNT(*) AS calls, tmp.* FROM (
SELECT b.id AS bid, u.id AS uid, b.rt_time, b.clli, u.city, u.state FROM billing
AS b LEFT JOIN users AS u ON b.switchowner = u.id
ORDER BY b.id DESC LIMIT 18446744073709551615)
AS tmp GROUP BY clli ORDER BY bid DESC LIMIT 18446744073709551615)
AS tmp2 GROUP BY uid ORDER BY bid DESC;```
Best Answer
Your query is not valid SQL, it is only allowed if ONLY_FULL_GROUP_BY is missing from @@sql_mode. I moved your example to Fiddle1 and tried. Note that I set
and as a consequence
'tmp.bid' isn't in GROUP BY
is the error message you get from your query.But, let's enable partial GROUP BY and see where that leads us. Your innermost query:
returns 23 rows. So far so good. If we add another layer of nesting:
This seems to work, but there is no rational explanation to why it should (AFAIK). Unless you find it clearly documented that it is guaranteed to work, I would not trust the result. Will it work when you upgrade to the next version?
Starting with 10.2 you can use window functions. They are a much better alternative if you can upgrade to that version:
Fiddle for MariaDB 10.3