Postgresql – Best way to model a database structure to store multidimensional variables

database-designgdalpostgispostgresqlraster

I am working with some NetCDF data which has multidimensional variables.

  • As far as I'm aware, the fields are not mandatory.
  • Each variable will have the same fields/attributes.
  • Each variable represents a measurement of the atmosphere, in time & space. Level represents a pressure level (like a layer of an onion — a thick layer).

For example, a variable called SO2 (Sulphur Dioxide) may be 4-D: Time, Level (pressure), latitude, longitude.

I have about 14 variables that I need to store.

Data retrieval tendencies

Now, I'm going to need to query this data and ask questions like:

  • Give me all the rows for variable SO2 that fall between latitude X and longitude Y at time T and level L.
  • What is the average temperature for time T at level L?
  • etc…

Current considerations

I could let each row in my DB be an observation and have the columns: type, lat, long, time, level, value.

Or, perhaps create a table for each kind of variable and have the columns: lat, long, time, level, value.

I'm not sure which will be more appropriate, or whether there is a better option, so I'd appreciate some advice.

Best Answer

So you have some basic data.

SELECT timestamp::timestamp, level, lat, long
FROM ( VALUES
  (now(), 7, 1, 9)
)
  AS t(timestamp, level, lat, long);
         timestamp          | level | lat | long 
----------------------------+-------+-----+------
 2017-02-08 15:38:54.903155 |     7 |   1 |    9

What you want to do is create a point with lat, and long...

SELECT timestamp::timestamp, level, ST_AsText(point::geography)
FROM ( VALUES
  (now(), 7, 1, 9)
)
  AS t(timestamp, level, lat, long)
CROSS JOIN LATERAL ST_SetSRID( ST_MakePoint(long,lat), 4326 )
  AS point;
         timestamp          | level | st_astext  
----------------------------+-------+------------
 2017-02-08 15:38:00.892956 |     7 | POINT(9 1)
(1 row)

Now, we just store that into a table..

CREATE TEMP TABLE foo
AS
SELECT timestamp::timestamp, level, point::geography
FROM ( VALUES
  (now(), 7, 1, 9)
)
  AS t(timestamp, level, lat, long)
CROSS JOIN LATERAL ST_SetSRID( ST_MakePoint(long,lat), 4326 )
  AS point;

Give me all the rows for variable SO2 that fall between latitude X and longitude Y at time T and level L.

This is not how GIS works, because between changes on a sphere. So instead, just use distance. So instead what we do

SELECT *
FROM foo
WHERE ST_DWithin(
  (ST_MakePoint(long,lat),4326)::geography,
  point,
  1000 -- distance in meters
)
FROM foo;

What is the average temperature for time T at level L?

For this we would have had to store the temperature, assuming we did -- which we didn't. It'd look like this.

SELECT avg(temp)
FROM foo
WHERE level = l
  AND time BETWEEN start AND finish;

As a side note ST_DWithin will use an index.. (as will level and timestamp if you create the indexes). However, if you're new to GIS you create GIS indexes with GIST and not btree.

CREATE INDEX ON foo USING gist ( point );

For a similar question see this answer.

UPDATE

This file is a complex proprietary raster,

gdalinfo file.nc 
Driver: netCDF/Network Common Data Format
Files: file.nc
Size is 512, 512
Coordinate System is `'
Metadata:
  NC_GLOBAL#Conventions=CF-1.6
  NC_GLOBAL#history=2017-02-08 12:30:09 GMT by grib_to_netcdf-2.0.2: grib_to_netcdf /data/data01/scratch/_mars-atls17-95e2cf679cd58ee9b4db4dd119a05a8d-mEMhIr.grib -o /data/data01/scratch/_grib2netcdf-atls00-95e2cf679cd58ee9b4db4dd119a05a8d-lrYLIt.nc -utime
Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"file.nc":aermr01
  SUBDATASET_1_DESC=[45x2x451x900] aermr01 (16-bit integer)
  SUBDATASET_2_NAME=NETCDF:"file.nc":aermr02
  SUBDATASET_2_DESC=[45x2x451x900] aermr02 (16-bit integer)
  SUBDATASET_3_NAME=NETCDF:"file.nc":aermr03
  SUBDATASET_3_DESC=[45x2x451x900] aermr03 (16-bit integer)
  SUBDATASET_4_NAME=NETCDF:"file.nc":aermr04
  SUBDATASET_4_DESC=[45x2x451x900] aermr04 (16-bit integer)
  SUBDATASET_5_NAME=NETCDF:"file.nc":aermr05
  SUBDATASET_5_DESC=[45x2x451x900] aermr05 (16-bit integer)
  SUBDATASET_6_NAME=NETCDF:"file.nc":aermr06
  SUBDATASET_6_DESC=[45x2x451x900] aermr06 (16-bit integer)
  SUBDATASET_7_NAME=NETCDF:"file.nc":aermr11
  SUBDATASET_7_DESC=[45x2x451x900] aermr11 (16-bit integer)
  SUBDATASET_8_NAME=NETCDF:"file.nc":so2
  SUBDATASET_8_DESC=[45x2x451x900] so2 (16-bit integer)
  SUBDATASET_9_NAME=NETCDF:"file.nc":z
  SUBDATASET_9_DESC=[45x2x451x900] geopotential (16-bit integer)
  SUBDATASET_10_NAME=NETCDF:"file.nc":t
  SUBDATASET_10_DESC=[45x2x451x900] air_temperature (16-bit integer)
  SUBDATASET_11_NAME=NETCDF:"file.nc":q
  SUBDATASET_11_DESC=[45x2x451x900] specific_humidity (16-bit integer)
  SUBDATASET_12_NAME=NETCDF:"file.nc":r
  SUBDATASET_12_DESC=[45x2x451x900] relative_humidity (16-bit integer)
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

You can import it by generating PAM files.

gdal_translate -sds file.nc file

Them remove the .aux.xml files with rm *.aux.xml; And, run this to import them all into the database

raster2pgsql file_* mySchema.mytable | psql -d database

This gets some kind of raster into the database. Now you have to start learning the PostGIS GDAL raster functions to see what and how you can work with it.. Using ST_MetaData we can see you have a 12 rasters with 90 bands (with ST_BandMetaData we can see they're 16bit unsigned).

SELECT rid, (metadata).* FROM foo.foo1 CROSS JOIN LATERAL st_metadata(rast) AS metadata;
 rid |     upperleftx     | upperlefty | width | height |      scalex       | scaley | skewx | skewy | srid | numbands 
-----+--------------------+------------+-------+--------+-------------------+--------+-------+-------+------+----------
   1 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90
   2 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90
   3 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90
   4 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90
   5 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90
   6 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90
   7 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90
   8 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90
   9 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90
  10 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90
  11 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90
  12 | -0.200000003394614 |       90.2 |   900 |    451 | 0.400000006789228 |   -0.4 |     0 |     0 |    0 |       90