Dockerfile, Docker image and reproducible environment

desktop-environmentdockerreproducible-build

The usual documentation and notes on docker mention version controlling and sharing the Dockerfile, which should let anyone build an identical image. This sounds great, however, we typically have commands like this one.

RUN apt-get update
pip install..

Which could install different things/versions/patches based on the time of the run and make debugging difficult.

On the other hand, sharing docker images does not give you benefits like version control and seeing what's exactly different between two images.

  • Which of these (dockerfile vs image) is supposed to be the reference to use for development and deployment?
  • Should the Dockerfile instead have more details on exact updates? even then the base image might be different based on when you are running it.

Best Answer

I think I would prefer sharing the Dockerfile. Obviously you need to specifiy a version in the FROM statement in your Dockerfile. Since for example different Ubuntu versions will have different packages available.

For system or -dev dependencies, you might want to actually let the version float freely to always installed the latest one.

Debian/Ubuntu packages

For any program installed with apt-get, for example curl, you can get the version number with

apt-cache policy curl | grep -oP 'Installed: \K\S+'

and then edit your Dockerfile to read something like

RUN apt-get install curl=7.47.0-1ubuntu2.2

Python

Python versions are easily handled with pip. Extract all version numbers of installed packaged and store them in a requirements file like this:

pip freeze > requirements.txt

Then in your Dockerfile run

RUN pip install -r requirements.txt
Related Question