I have a table of jobs that has columns for unique id, app, env, the status of the job and the timestamp for it, along with other data not relevant to the question. Like so,
CREATE TABLE the_table (
id int AUTO_INCREMENT,
app varchar(255),
env varchar(255),
time_end datetime(3),
status varchar(255),
PRIMARY KEY (id)
);
Each job belongs to an app in en environment, env, and the values for these are limited, currently around ten for each of app and env. Some example data (with changed values):
INSERT INTO the_table(app,env,time_end,status)
VALUES
( 'app1', 'env1', '2017-11-14 14:45:07.037', 'finished' ),
( 'app1', 'env2', '2017-08-01 08:30:03.951', 'finished' ),
( 'app1', 'env1', '2017-10-05 21:36:22.716', 'finished' ),
( 'app2', 'env2', '2017-05-18 11:26:58.122', 'finished' ),
( 'app1', 'env1', '2017-09-26 00:35:37.914', 'finished' ),
( 'app1', 'env3', '2017-11-05 20:27:00.762', 'finished' ),
( 'app1', 'env3', '2017-06-04 16:55:19.089', 'finished' ),
( 'app3', 'env3', '2017-10-05 04:36:00.820', 'finished' ),
( 'app3', 'env3', '2017-09-22 04:11:51.602', 'finished' ),
( 'app2', 'env1', '2017-10-09 15:36:38.642', 'finished' ),
( 'app1', 'env2', '2017-10-04 16:07:48.272', 'finished' ),
( 'app1', 'env1', '2017-09-16 15:35:22.239', 'finished' ),
( 'app1', 'env1', '2017-10-12 19:53:29.434', 'finished' ),
( 'app1', 'env1', '2017-08-23 07:07:56.983', 'finished' );
I want to get, for each combination of app and env, and status=finished
, the latest record. My first solution would be something like:
xs = Query("SELECT DISTINCT app FROM the_table")
ys = Query("SELECT DISTINCT env FROM the_table")
for app in apps do{
for env in envs do{
r = Query("SELECT * FROM the_table WHERE status='finished' AND the_table.env=env AND the_table.env=env ORDER BY time_end desc LIMIT 1")
}
}
This could potentially be a lot of queries though and I would like to find another solution more idiomatic to SQL. This is using MySQL 5.7, so I can't use limit
in subqueries.
Best Answer
If you wanted only the last
time_end
value per(app, env)
, you could just use GROUP BY:However, you want in each case the entire row where
time_end
is last. That is a classic "greatest N per group" problem. One way to solve it is in two steps: first get the last values themselves, then join them back to the original table and get the corresponding complete rows. The query above, therefore, is the first step. Use it as a derived table and join it back tothe_table
to get the final output:Other database products support more advanced SQL syntax – specifically, window functions – that allows you to solve this category of problems without using a join, thus offering better performance. MySQL does not support window functions (yet?) but there is a way to work around that limitation using variables.
Specifically, you would use variables to rank rows within each
(app, env)
group based on thetime_end
value and then filter the rows on the ranking value:The query works like this:
The rows are sorted in such a way that identical
(app, env)
rows are arranged in groups, the lasttime_end
row being the first row in each group.Each row is then assigned a ranking value with the help of variables. Variables
@app
and@env
hold the previous row's values ofapp
andenv
and take part in calculating the@rank
value: if the current row'sapp
andenv
match the previous row's values, the@rank
value is increased by 1, otherwise it is set to 1.Thus, the first row in each group always gets the value of 1, because the previous row belongs to a different group or, in the case of the very first group, is simply non-existent (
@app
and@env
in that case each hold the initial value, an empty string).The results are used as a derived table and are filtered on the ranking value: only the rows with the ranking value of 1 are returned (they hold the last
time_end
value in each group).My only issue with this method is that it is very MySQL-specific, which, in my opinion, is worth keeping in mind. No other RDBMS supports this syntax (unless that product is a MySQL fork, like MariaDB, for instance).
Both methods shown can be found and played with in this live demo at Rextester.