How to use files from HTTP as prerequisites in GNU make

gnu-makehttpremotetimestamps

I want to use files from the World Wide Web as prerequisites in my makefiles:

local.dat: http://example.org/example.gz
    curl -s $< | gzip -d | transmogrify >$@

I only want to "transmogrify" if the remote file is newer than the local file, just like make normally operates.

I do not want to keep a cached copy of example.gz – the files are large, and I don't need the raw data. Preferably I would want to avoid downloading the file at all. The goal is to process a few of these in parallel using the -j make flag.

What is a clean way to solve this? I can think of a few ways to go:

  • Keep an empty dummy file stashed away, updated every time the target is recreated
  • Some plugin using GNU make's new plugin system (which I know nothing about)
  • A make-agnostic way that mounts HTTP servers in the local filesystem

Before digging further, I would like some advice, preferably specific examples!

Best Answer

Try something like this in your Makefile:

.PHONY: local.dat

local.dat:
    [ -e example.gz ] || touch -d '00:00' example.gz
    curl -z example.gz -s http://example.org/example.gz -o example.gz
    [ -e $@ ] || touch -d 'yesterday 00:00' $@
    if [     "$(shell stat --printf '%Y' example.gz)" \
         -gt "$(shell stat --printf '%Y' $@)"         ] ; then \
      zcat example.gz | transmogrify >$@ ; \
    fi
    truncate -s 0 example.gz
    touch -r $@ example.gz

(note: this is a Makefile, so the indents are tabs, not spaces. of course. It is also important that there are no spaces after the \ on the continuation lines - alternatively get rid of the backslash-escapes and make it one long, almost-unreadable line)

This GNU make recipe first checks that a file called example.gz exists (because we're going to be using it with -z in curl), and creates it with touch if it doesn't. The touch creates it with a timestamp of 00:00 (12am of the current day).

Then it uses curl's -z (--time-cond) option to only download example.gz if it has been modified since the last time it was downloaded. -z can be given an actual date expression, or a filename. If given a filename, it will use the modification time of the file as the time condition.

After that, if local.dat doesn't exist, it creates it with touch, using a timestamp guaranteed to be older than that of example.gz. This is necessary because local.dat has to exist for the next command to use stat to get its mtime timestamp.

Then, if example.gz has a timestamp newer than local.dat, it pipes example.gz into transmogrify and redirects the output to local.dat.

Finally, it does the bookkeeping & cleanup stuff:

  • it truncates example.gz (because you only need to keep a timestamp, and not the whole file)
  • touches example.gz so that it has the same timestamp as local.dat

The .PHONY target ensures that the local.dat target is always executed, even if the file of that name already exists.

Thanks to @Toby Speight for pointing out in the comments that my original version wouldn't work, and why.

Alternatively, if you want to pipe the file directly into transmogrify without downloading it to the filesystem first:

.PHONY: local.dat

local.dat:
    [ -e example.gz ] || touch -d '00:00' example.gz
    [ -e $@ ] || touch -d 'yesterday 00:00' $@
    if [     "$(shell stat --printf '%Y' example.gz)" \
         -gt "$(shell stat --printf '%Y' $@)"         ] ; then \
      curl -z example.gz -s http://example.org/example.gz | transmogrify >$@ ; \
    fi
    touch -r $@ example.gz

NOTE: this is mostly untested so may require some minor changes to get the syntax exactly right. The important thing here is the method, not a copy-paste cargo-cult solution.

I have been using variations of this method (i.e. touch-ing a timestamp file) with make for decades. It works, and usually allows me to avoid having to write my own dependency resolution code in sh (although I've had to do something similar with stat --printf %Y here).

Everyone knows make is a great tool for compiling software...IMO it's also a very much under-rated tool for system admin and scripting tasks.

Related Question