Postgresql – Running a task without waiting for the results in Postgres 11 on RDS

amazon-rdspostgresql

I've got a question about spawning stand-alone threads on Postgres. Here's what I'd ideally like:

Have a postgres-side solution to automatically launch tasks on a schedule in their own threads.

Something like the (imaginary) example below:

   CALL_IN_NEW_THREAD do_db_maintenance();

I understand that this doesn't exist, but I don't know what is possible or preferable. I'm in design phase, so my question isn't about SQL syntax, etc. I'm trying to figure out if there is a way to "detach" a query from the original process so that my connection can close down without waiting for the task to finish. Or, even better, if there is a way to auto-start routine tasks on Postgres itself without a client kicking it off or responding to a NOTIFY.

First up, the setup:

  • We've got a number of routine tasks to run periodically. Imagine refreshing materialized views, integrating posted changes, DB admin chores, etc.

  • The results of a query/task are written back to the database, so there's no need for the client to collect them.

  • Imagine that we've got a task_definition table that makes it simple to figure out which tasks are pending and should be run.

  • Our deployment environment is Postgres 11.x on RDS, so pg_cron is not available. In fact, the command-line isn't available.

  • We'll be on Postgres 11.x until RDS certifies PG 12. So, I'm looking for a PG-11 based solution for the foreseeable future, likely the better part of the next year.

What I am hoping to do goes like this:

  • Have an external program log into Postgres, scan the task_definition table, find the jobs that should be started, and start them.

  • "Start them" in distinct threads.

  • And then log out.

So, the client just kicks off the work and gets out. Is this a possibility in Postgres without a shared library or anything else that's impossible on RDS?

If it's not possible, I can always have my external program log in, find pending tasks, and launch them in distinct threads from the client. (Be kind and assume that I've handled the locking correctly and am not double-launching any of the tasks.) It seems a bit silly to have the client connection hanging around for no real purpose, but it's not the end of the world. On the upside, it might make it easier to notice if I screw something up and have a task that takes too long.

I've looked around at various solutions for scheduled tasks in Postgres, it seems to be a common requirement. A few notes on solutions I've seen:

Triggers
Sure, that makes sense…they work as a kind of signal. But then you've tied up the process that's running the trigger until the task is done. (There doesn't seem to be a way to detach a procedure call.) So, this doesn't feel like a great help. I guess I could have a utility table where the trigger serves no other purpose than to run a method, but I don't see what that offers over just running the method in a query directly.

pg_cron
Not available on RDS.

PgAgent
Not available on RDS.

cron
Not available on RDS. But you could set up an EC2 instance, etc. to run cron.

Your language here
Use your language of choice to connect, run, wait, repeat. I'm guessing that this is what I'll end up having to do.

NOTIFY/LISTEN
Cool! But it still requires "Your language here" to do the listening, so I don't think it changes anything for me.

dblink
This sounds promising, but I'm not sure it's what I'm after. dblink_send_query supports one async query per connection, and it sounds like the connection has to be sustained. It also doesn't appear from the docs that you can dblink back to the same database. We don't have a second DB running now, but will at some point…so dblink is something we would consider.

Best Answer

You seem to be beating your head against the limitations of RDS. Is what RDS offers you worth the limitations, over just renting the hardware and running your own database on it?

Triggers and NOTIFY/LISTEN are for event-driven things, not clock-driven things. They are for solving a different problem than the one you have.

My prefered approach is to combine 'cron' and 'Your language here'. Write the job in your language, launch it with cron. For example if I am writing a web-app, I actually write the maintenance logic into the web-app, and then have cron use wget to hit a special URI to invoke it. Yes, that would require having a non-RDS machine that lets you run cron and your language of choice on it (unless you give up on RDS), but if you have a web-app, you already have an app-server which could be used for that (unless that machine is also AWS-managed with no CLI access).

You could also use pgAgent in the same way. You would need a separate machine to run the pgAgent software. You could also run a tiny database on that machine to hold the jobs--pgAgent can read the jobs from one database but execute them against another one. I think (but I have not tested this) that you can even put the jobs into a schema of your main database. While you cannot use 'CREATE EXTENSION pgagent', you can run the script manually to create all of the objects. (And if you do want to use pgAgent, you could bug AWS to whitelist it--since it does not require a superuser there is no reason not to do so other than the effort involved)