I would assume most distros accept individual private donations (they may also accept free hosting). However, that is probably not the bulk of their financing in most cases.
Note that some of the major distros may have some paid staff, and possibly also office space, the cost of which likely exceeds that of hosting the repos1. This should not be taken to mean that they are not primarily volunteer based (except for the commercial variants, they are), just that they do have operating budgets.
Fedora is owned by Redhat, and the latter is a publicly traded, billion dollar annual business. I would presume they do quite a bit to help support the former.
According to wikipedia, CentOS is now also owned by Redhat and earlier this year Redhat announced their ongoing sponsorship of CentOS development.
Ubuntu is owned by Canonical, which I do not think is on a par with Redhat, but they probably still have revenues into the tens of millions USD per year. Last time I downloaded an image, Ubuntu was pretty aggressive about encouraging you to make a small donation at the same time. $5 a year would I think cover the costs of repo hosting associated with the average installation.
The Debian project has been around for nearly 20 years and surely has a substantial core of users willing to help support it. They also have a list of "partners" here which provide them with resources. I would think Canonical helps out significantly, since Ubuntu is reliant upon Debian, but judging from this link provided in Kiwi's answer, they are still having to beg publicly for $250K to cover meeting costs, which is pretty disappointing.
Arch is likely much poorer than the other distros mentioned here, but they may still collect enough money from various sources to support some development staff and hosting. They do not appear to obviously solicit on their site, so I would guess this funding comes mostly from industry (and possibly, government) grants.
1. To get some idea of how much this hosting would actually cost, consider that GNU/Linux systems probably account for 1-2% of desktop systems worldwide and at least 40% of web servers. If we then assume this might amount to ~25 million systems, if a large (theoretical) distro accounted for 10% of those and each user accounted for 4 MB a day averaged out over time, this would amount to 10 TB/day. I would think if you know the right people, you could perhaps get 3000 TB/month for <$5000 US.
Packages
I publish a signed package repository, and here's how it works.
The system administrator configures repositories by adding a configuration file such as /usr/local/etc/pkg/repos/JdeBP.conf
. As part of doing so, xe tells the package manager the public key that is to be used to check the signatures on the repository. Xe obtains this key from me in some suitably trusted fashion, and saves it in a file somewhere like /usr/local/etc/pkg/keys/JdeBP.pub
. Xe then names that as the public key file in /usr/local/etc/pkg/repos/JdeBP.conf
.
I sign the package repository with the private key that only I have using the command pkg repo . /elsewhere/package_signing_key
. This creates signed information about the repository and the packages in three files, meta.txz
, digests.txz
, and packagesite.txz
. Each of those archives has two files, one being a signature
file for the other. The digests and packagesite archives contain hashes of each of the package archive files. The meta archive just contains the names of the other two and some versioning of the pkg-ng tool information.
So this is very much like Secure APT. There are some differences:
- Instead of
Release
giving the checksums for Packages
and Packages
then giving the hashes for the actual package archives, pkg-ng has just one level. packagesite.yaml
gives the hashes of the actual package archives directly.
- Instead of things being split into separately downloadable
Release
and Release.gpg
files, and then further Packages
and Sources
files, the packagesite.yaml
file (covering the entire repository) and its signature
are downloaded as a unit in one fetch
operation (and one HTTP/FTP transaction) as the packagesite.txz
file.
But the idea is much the same. A system administrator trusts that the packagesite.yaml
file came from me because its accompanying signature
can be checked against the locally stored, trusted, copy of my public key. A system administrator trusts that the redo-1.3.txz
file came from me because its hash matches the hashes from the (now) trusted packagesite.yaml
.
Ports
Ports are a very different kettle of fish. Debian's Secure APT treats source packages as just more packages. FreeBSD/TrueOS ports are not source packages in the Debian sense, but are rather automated ways of obtaining and building source packages that are published by someone else. A port is in essence a makefile with some instructions on where to fetch
source from. It has a list of the hash(es) of whatever is to be fetched.
The port itself comes from the FreeBSD or TrueOS repository, using either Subversion (if FreeBSD) or git (if TrueOS or FreeNAS). The standard ideas for trusting Subversion or git thus apply. On TrueOS, for example, the "remote" URL used when fetching the ports (themselves) with git is an HTTPS url for a GitHub repository that iXsystems vouches (in the TrueOS Handbook) is one that it owns.
So a system administrator trusts a port because xe has obtained it using a Subversion or git fetch that xe trusts. Xe trusts the source archive fetched by the port because it matches the hash listed in the (now) trusted port.
Notes
Debian's Release.gz
and Packages.gz
are pretty much in effect just ways to compress the HTTP transport. I've glossed over some other things that aren't to do with security, such as differences in how one is expected to handle multiple operating system releases.
Debian has moved towards how FreeBSD works over the years, and doesn't work like that wiki page says any more. Nowadays one has the hashes and the signature all in one, more like a FreeBSD repository, in an InRelease
file. This prevents a "tearing" problem that occurs when one downloads Release
and then Release.gpg
and the repository owner has updated the repository in between the two downloads, causing a signature mis-match.
(Debian only did things this way originally because it grew these things in stages over the years, each built upon the preceding mechanisms without changing them: first the Package
system, then the Release
mechanism on top of that, then the Release.gpg
mechanism on top of that.)
Also: FreeBSD has another, different, way of doing this which involves "fingerprints" and a signed digests
file (in the digests.txz
archive).
I've also glossed over the security considerations for the signing key, as that isn't really relevant to an answer that is discussing how this is like/unlike Secure APT. The requirements of private key security are general to the whole notion of signing things with public/private keys, and are independent of repository structures.
Further reading
Best Answer
This isn't a direct answer to your question, but there are several things you can do to mitigate against this risk. The simplest one is to check your downloaded packages against the checksums from a different mirror than you downloaded from.
When my package manager (
poldek
) downloads a package, I have it set to keep a copy of the downloaded rpm in a cache folder. It automatically checks the checksum of the download against the package repository and warns/aborts on a mismatch, but if you were worried about man-in-the-middle attacked against your distro repository it would be easy to write a secondary script that browsed through all your downloaded packages and verify them against checksums you download from a different mirror. You can even run your first install as a dry-run so that packages get downloaded but not installed, then run your verification script, then do the actual install.This doesn't stop a compromised package from getting into the distro's repository, but most distros have other ways of mitigating that, and even signed packages would not guarantee this was never a problem. What it does do is stifle the targeted man-in-the-middle attack vector. By using a separate source and downloading on a separate channel, you kill the ease with which a compromised package could be dropped into a tapped line.