I am trying to filter all sessionId
s that occur once out of an existing result set.
This query is being used in a web application and runs on a big dataset (~ 35 million rows), so I want to prevent having subqueries here.
I tried this, which provides a filtered result, except that I now only get one row for each sessionId (and I want every request
and response
):
CREATE TABLE `api_log` (
`id` varchar(50) NOT NULL,
`clientId` varchar(100) DEFAULT NULL,
`inserted` int(11) DEFAULT NULL,
`sessionId` mediumtext DEFAULT NULL,
`stage` varchar(120) DEFAULT NULL,
`request` longtext CHARACTER SET utf8mb4 DEFAULT NULL,
`response` longtext CHARACTER SET utf8mb4 DEFAULT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO api_log
VALUES
("1", "abc", 1621008484, "session1", "production", '{"key":"value"}', '{"key":"value"}'),
("2", "abc", 1621008494, "session2", "production", '{"key":"value"}', '{"key":"value"}'),
("3", "abc", 1621008584, "session1", "production", '{"key":"value"}', '{"key":"value"}'),
("4", "abc", 1621008684, "session2", "production", '{"key":"value"}', '{"key":"value"}'),
("5", "abc", 1621008784, "session3", "production", '{"key":"value"}', '{"key":"value"}'),
("6", "abc", 1621008884, "session4", "production", '{"key":"value"}', '{"key":"value"}'),
("7", "abc", 1621008984, "session5", "production", '{"key":"value"}', '{"key":"value"}'),
("8", "abc", 1621009084, "session6", "production", '{"key":"value"}', '{"key":"value"}'),
("9", "abc", 1621009184, "session7", "production", '{"key":"value"}', '{"key":"value"}'),
("10", "abc", 1621009284, "session8", "production", '{"key":"value"}', '{"key":"value"}');
SELECT
`clientId`,
`sessionId`,
`inserted`,
`stage`,
`request`,
`response`
FROM
`api_log`
WHERE
(stage = 'production')
AND (clientId = 'abc')
AND (
`inserted` BETWEEN 1621008482 AND 1621009285
)
GROUP BY
`clientId`,
`stage`,
`sessionId`
HAVING
COUNT(sessionId) > 1
Is there any trick to get all rows where a sessionId
occurs more than once?
In this case, I get two rows, one for session1
and one for session2
, but I am missing two more because both mentioned sessionId
s have an additional row that should match.
SQL fiddle: http://sqlfiddle.com/#!9/f87bb54/3
Best Answer
If I understand your requirement correctly, this is the kind of problem where window aggregation shines. Use the window version of the
COUNT(*)
function on the filtered dataset to obtain the counts alongside the other columns. Then filter on the count results to get only the rows you want. Your output can include any or all of the columns your table has:You can play with this solution at dbfiddle.uk: