I have a table in PostgreSQL with a timestamp and a value.
I would like to interpolate the missing values under "lat".
The value under "lat" are tidalheights above a datum.
For the purpose it is ok to interpolate the missing value linear between the two known values.
What is the best method to do so in PostgreSQL?
Edit 20200825
I have solved this problem in a different way using the QGIS fieldcalculator. Problem with this method: it takes a long time and it the process runs client-side and I would like to run it directly on the server.
In steps, my workflow was:
- The interval between the recorded "lat" values is 10 minutes. I calculated the increment per minute between two recorded values and stored this in an extra column called "tidal_step" at the recorded "lat"-value. (i stored the timestamp also as an "epoch" in a column)
In QGIS:
tidal_step =
-- the lat value @ the epoch, 10 minutes or 600000 miliseconds from the current epoch:
(attribute(get_feature('werkset','epoch',("epoch"+'600000')),'lat') -
-- the lat value @ the current
attribute(get_feature('werkset','epoch',"epoch"),'lat'))
/10
for the first two values from the example image that results in:
(4.95 – 5.07) /10 = -0.012
- I determined the amount of minutes of the "lat"-value to be interpolated, past the last recorded instance where a "lat" value was recorded and stored this in a column: "min_past_rec"
In QGIS:
left(
right("timestamp",8) --this takes the timestamp and goes 8 charakters from the right
,1) -- this takes the string from the previous right( and goes 1 character left
for the first value in the example:
2019-01-01 00:15:15 returns: '5'
This is 5 minutes past the last recorded value.
- I interpolated the missing values by adding the ("min_past_rec" * "tidal_step") to the last recorded "lat" value and stored this in in column called "lat_interpolated"
In QGIS
CASE
WHEN "lat" = NULL
THEN
-- minutes pas the last recorded instance:
("min_past_rec" *
-- the "tidal_step" at the last recorded "lat"-value:
(attribute(get_feature('werkset','epoch',
("epoch" - --the epoch of the "lat" value to be interpolated minus:
left(right("timestamp",8),1) * 600000 -- = the amount of minutes after the last recorded instance.
+ left(right("timestamp",6),2) * 1000) -- and the amount of seconds after the last recorded instance.
),'tidal_step')) +
-- the last recorded "lat"-value
(attribute(get_feature('werkset','epoch',("epoch" - left(right("timestamp",8),1) * 600000 + left(right("timestamp",6),2) * 1000)),'lat'))
With data from the example:
2019-01-01 00:17:33:
"lat_interpolated" = "min_past_rec" * "tidal_step" + "lat" =
7*-0.012 + 4.95 = 4.866
- delete obsolete columns from database
Which statements/script should I use in PostgreSQL to perform the same task?
Best Answer
I have a (partial) solution - what I did was the following (see the fiddle available here):
The algorithm I used for interpolating was
if there's a sequence of 1
NULL
, take the average of the value above and the value below.A sequence of 2
NULL
s, the top assigned value is the average of the two records above it and the bottom assigned one is the average of the two records below.In order to do this, I did the following:
Create a table:
Populate it with some sample data:
Note that records 3 & 4 and 7 are
NULL
.And then I ran my first query:
Result (spaces mean
NULL
- it's much clearer on the fiddle):Note the use of the
LAG()
andLEAD()
Window functions (documentation
). I've used them on the same table, but sorted differently.This and using the
OFFSET
option means that from my original singlelat
column, I now have 6 extra columns of "generated" data which are very useful for assigning values to the missingNULL
values. The last (partial) piece of the puzzle is shown below (full SQL query is at the bottom of this post and also in the fiddle).Final result:
So, you can see that the calculated value for record 7 is the average of records 6 & 8 and record 3 is the average of records 1 & 2 and the assigned value for record 4 is the average of 5 & 6. This was enabled by the use of the
OFFSET
option for theLAG()
andLEAD()
functions. If you get sequences of 3NULL
s, then you'll have to use anOFFSET
of 3 and so on.I'm not really happy with this solution - it involves hard-coding for the number of
NULL
s and thoseCASE
statements will become even more complex and horrible. Ideally some sort ofRECURSIVE CTE
solution is required, but I HTH!=============================== Full Query ========================