I have a table in a MySQL database that has information about passenger flights. I want to do some analysis of flight delays. I would like to add a calculated attribute to each row that indicates how many flights arriving at the current row's departure airport, by the same carrier, were delayed in the past 24 hours.
I haven't had any luck writing this query. I think that I can express my intent with:
-- Find a count of all rows
count(*) where
-- Of the same carrier
a.FlightCarrier = thisRow.FlightCarrier
-- that arrived at our departure airport
a.DestinationAirport = thisRow.OriginAirport
-- scheduled to arrive before ScheduledDepartureTimeUTC
a.ScheduledArrivalTimeUTC < thisRow.ScheduledDepartureTimeUTC
-- scheduled to arrive after (ScheduledDepartureTimeUTC - 24h)
a.ScheduledArrivalTimeUTC > (thisRow.ScheduledDepartureTimeUTC - 24h)
-- had a delayed arrival
a.DelayedArrival = 1
I attempted to write this as a SQL statement:
alter table flight add column DelayedCarrierArrivals24H int as
(select count(*) from flight a where
a.FlightCarrier = FlightCarrier
and a.DestinationAirport = OriginAirport
and a.ScheduledArrivalTimeUTC < ScheduledDepartureTimeUTC
and a.ScheduledArrivalTimeUTC > (ScheduledDepartureTimeUTC - (60 * 60 * 24))
and a.DelayedArrival = 1) STORED;
However, MySQL balks at this with "You have an error in your SQL syntax… near 'select count(*) from flight a where a.FlightCarrier = FlightCarrier"
What's the 'right' way to create this column in MySQL? I had originally thought to use windowing functions but couldn't divine the right way to express my intent with those features, either.
Best Answer
The problem
https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html says:
Your expression includes a subquery. Note that the error message pointed directly to the subquery: "... near 'select ..."
Generated columns are rather limited (for good reasons).
Suggestion 1
May I suggest that you do the computation when you are fetching data from the table. Consider writing a
VIEW
or Stored routine to hide it from the caller.Suggestion 2
Perhaps this comes closer to what you are looking for?
It won't be efficient due to the correlated subquery, but we can work on that. First, verify that it does what you need.
Suggestion 3
Better would be to have a column for that info, then do the computation separately as you are storing the
b
information.