APT-GET – How APT-GET Really Works

aptaptitudedebianpackage-managementrepository

Okay, I understand how I may use apt-get {install|upgrade|remove} mypackages to install, upgrade, or remove binaries as well as their configuration data files and dependencies (actually, remove will only remove the binaries unless additional flags are provided).

I am not looking for how it is used as the man describes this, but high level what it is doing. My end goal is to create a means for me to install and manage some custom software (created by a make file) on multiple remote machines, and I need to learn more about the process. If answers to this question are based on which distribution is used, please tailor to Debian.

In addition to generally how it works, I have the following specific questions:

  1. How does the client that is accessing the apt repository keep track of the files?
  2. Must the repository be hosted on the same operating system (i.e. can apt repository be hosted on redhat)?
  3. How are the locations to install files specified? Is this specified by the .deb file?
  4. How is a remote machine accessing the repository? Is it just ftp(s) or http(s)?
  5. Is the machine that is hosting the repository running special software (like gitlab for a git repository), or is it just some structured file system?

Best Answer

You need to take a look at https://wiki.debian.org/Packaging — the packaging tutorial there will help you a lot, as well as parts of the new maintainer's guide.

As to your questions, in order:

  1. The repository contains "list" files. E.g.., http://http.us.debian.org/debian/dists/stretch/main/binary-amd64/Packages.xz. apt-get update downloads these list files, and stores them in /var/lib/apt/lists. The list files list all the packages including a bunch of metadata and a relative URL to find the .deb at. (They're human-readable plain-text files, so you can just look at it).

  2. OS doesn't matter. You could host it on Windows, if you wanted. (Well, you'd maybe have trouble with file names Windows doesn't like.) (See also #4 and #5).

  3. Yes, it's inside the deb file. A deb file is actually an archive (using ar). Inside are some tar files; one of them is (essentially) extracted to /.

  4. It's just HTTP (or HTTPS, or FTP, or... apt-get supports a lot of protocols). Nothing special, though. Note that there are Release files, signed with gpg, which guarantee integrity even w/o HTTPS. Debian mirrors mostly use HTTP, not HTTPs. (A few support HTTPS as well for confidentiality).

  5. It's just a structured filesystem.

A quick, high-level overview of how apt-get interacts with a package source:

  1. You configure which sources to look at in your sources.list file. Consider a line like:

    deb http://http.us.debian.org/debian/ stretch main
    

    deb says this is a source for gettings .deb (binary) files; then there is the URL-prefix, suite/release ("stretch"), and component ("main").

  2. apt-get has a list of architectures, it gets that from dpkg. Let's say dpkg --print-architecture is amd64. apt-get can now build the URLs its actually going to download from, by combining the URL-prefix, the word "dists", the suite, the component, and the architecture. Then it tacks on a few fixed filenames, like "Packages.xz". That gives the URL above (in #1). There are a few more files with defined names/paths, like the Release file http://http.us.debian.org/debian/dists/stretch/Release and its signature (same, with .gpg appended). These are all (possibly-compressed) plain-text files. The release file contains checksums for other files apt-get is going to download, like Packages.xz.

  3. The Packages.xz file lists all the packages in that suite/codename/architecture. It also gives the path where that file is located; for example pool/main/0/0ad/0ad_0.0.21-2_amd64.deb.

  4. When you ask apt-get to download a package, it uses that location + the base URL to download the package, so that package is at http://http.us.debian.org/debian/pool/main/0/0ad/0ad_0.0.21-2_amd64.deb

  5. The other interesting directory is source instead of binary-amd64. That's used for your deb-src entries; it contains info about source packages (and is otherwise fairly similar).

  6. There are some other things (all of them optional, I believe) that can be part of the repository (i.e., available via HTTP): diffs between different versions of the Packages.xz file; translations of package descriptions, a complete list of every installable file and which package it belongs to (Contents-amd64.gz, used by e.g., apt-file, not by apt-get) etc. These likely aren't relevant to you, but you can see them all by browsing around http://http.us.debian.org/debian/dists/stretch/; most of them are plain-text files.

All these files are plain text. They can, in theory, be created by hand. In practice, everyone uses one of these repository generation tools. Here—and I caution this was a choice made a long time ago, so may be outdated—we use mini-dinstall. The output of those tools are ordinary files or, at worst, symlinks. You can rsync them over to whatever web server you want.

Related Question