Query 1, tested in SQLFiddle-1
SET @ward_id_to_check = 1 ;
SELECT
st.patient_id,
st.bed_id AS starting_bed_id, -- the first bed a patient uses
-- can be omitted
st.admitted,
MIN(en.discharged) AS discharged
FROM
( SELECT patient_id, bed_id, admitted, discharged
FROM t
WHERE t.ward_id = @ward_id_to_check
AND NOT EXISTS
( SELECT *
FROM t AS prev
WHERE prev.ward_id = @ward_id_to_check
AND prev.patient_id = t.patient_id
AND prev.discharged = t.admitted
)
) AS st
JOIN
( SELECT patient_id, admitted, discharged
FROM t
WHERE t.ward_id = @ward_id_to_check
AND NOT EXISTS
( SELECT *
FROM t AS next
WHERE next.ward_id = @ward_id_to_check
AND next.patient_id = t.patient_id
AND next.admitted = t.discharged
)
) AS en
ON st.patient_id = en.patient_id
AND st.admitted <= en.admitted
GROUP BY
st.patient_id,
st.admitted ;
Query 2, which is the same as 1 but without the derived tables. This will probably have better execution plan, with proper indexes. Test in SQLFiddle-2:
SET @ward_id_to_check = 1 ;
SELECT
st.patient_id,
st.bed_id AS starting_bed_id,
st.admitted,
MIN(en.discharged) AS discharged
FROM
t AS st -- starting period
JOIN
t AS en -- ending period
ON en.ward_id = @ward_id_to_check
AND st.patient_id = en.patient_id
AND NOT EXISTS
( SELECT *
FROM t AS next
WHERE next.ward_id = @ward_id_to_check
AND next.patient_id = en.patient_id
AND next.admitted = en.discharged
)
AND st.admitted <= en.admitted
WHERE
st.ward_id = @ward_id_to_check
AND NOT EXISTS
( SELECT *
FROM t AS prev
WHERE prev.ward_id = @ward_id_to_check
AND prev.patient_id = st.patient_id
AND prev.discharged = st.admitted
)
GROUP BY
st.patient_id,
st.admitted ;
Both queries assume that there is a unique constraint on (patient_id, admitted)
. If the server runs with strict ANSI settings, the bed_id
should be added in the GROUP BY
list.
You could expand your activities table so that instead of a single timestamp column, it includes two timestamps, ts_from
and ts_to
, that indicate the time period for which the row's data is applicable. As long as no data changes, you do not insert any new rows into your table. When data does change, you insert the new data into a new row and update the old row's ts_to
column.
Then you can create a calendar
table that contains all timestamps of interest to you (in your case that table would contain a row for every day and every hour in the period of time that is of interest to you).
Finally, you get the result by cross-joining the two tables and keeping only those records whose date (from calendar table) falls between ts_from
and ts_to
.
For example (my example uses only dates and not datetimes, but you can change that easily):
create table calendar (
d date
);
create table time_data (
d_from date,
d_to date,
val1 int(10),
val2 int(10)
);
insert into time_data values('2015-01-01', '2015-01-15', 10, 15);
insert into time_data values('2015-01-15', '2015-02-21', 7, 18);
insert into time_data values('2015-02-21', '2015-03-04', 1, 55);
insert into time_data values('2015-03-04', '2015-04-21', 3, 22);
insert into time_data values('2015-04-21', null, 9, 99);
I filled the calendar table with all dates from 2015-01-01 to now and did the simple query
select d, val1, val2
from calendar, time_data
where d >= d_from and d < ifnull(d_to, curdate())
order by d
The resultset contains an entry for every date in the range (2015-01-01, 2015-05-01
) and val1, val2 from last entry from time_data
before that date.
For a nice procedure to fill the calendar table, see this article.
However, since a cross join is an expensive operation, and you have large amounts of data, thoroughly test this (or any other) solution before applying it to your production system.
Best Answer
It depends on exactly what you're trying to do with the data -- if it's only being used for graphing, and you don't need high resolution the further out it goes (eg, you're not trying to plot a graph with full temporal resolution for a day from 1 year ago), you might actually want to look at RRDTool rather than a relational database.
If you're going to ever need to do analysis of the values (how often does a place get above 80°F?), you'll want to store discrete values, not an XML structure; but you could also use flat file stuctures that are meant for dealing with this type of data (eg, CDF, NetCDF ... maybe even HDF )
update :
I'd store each time as a separate record, as it makes it easier to adjust the granularity when graphing. For instance, to extract the high/low/mean for each hour:
Also, it allows you to change the cadence for the measurements without needing to change the table structure.