Debian – How to find Debian packages that would free up the most space

aptdebiandisk-usagepackage-management

Say I have a Debian machine where I would like to free up space on '/' by removing useless packages. To find good candidates of packages to review, I would like to focus my attention on the largest ones first.

It seems that the standard solution to do this is to list all installed packages by their installed size. However, this solution has a lot of drawbacks, because it ignores dependencies and ignores whether a package was automatically or manually installed:

  • If a package is large but many manually installed packages depend on it, then maybe it is not a good candidate to consider for removal (e.g., removing libicu52 is a bad idea even though it's large)
  • If a package depends on another package, then removing the second one will also save up the space obtained by removing the first one (e.g., removing libwine also removes wine)
  • If a package A depends on another package B and a third package C was only automatically installed as a dependency to B, then removing A will remove B and C will be auto-removed, which should be accounted for (e.g., removing wesnoth-1.10-data removes wesnoth-1.10, which means that wesnoth-1.10-music would be removed).

It seems that a right tool for this job should only propose manually installed packages for removal, and should sort them by the space that would be reclaimed by removing them and then running autoremove (removing automatically installed packages that are no longer necessary).

Of course you could simulate this by a variant of this solution, but it is both slow and ugly. Hence my question: is there a standard tool that looks at the dependency graph of packages and computes this information? (I am considering to write a script for this, but I'd like to make sure it doesn't exist yet.)

Best Answer

I don't know of a one-stop command line solution, although all the tools exist (apt-cache depends --installed, apt-cache rdepends --installed --recurse, apt-mark showmanual, dpigs, etc.). It would be possible to hack together a command line script that could attempt to find large packages with few manually installed reverse dependencies. Here's the proof of concept I used as a starting point:

dpigs | awk 'NR == 1 {print $2}' | xargs apt-cache rdepends --installed --important --recurse | awk '!/:/ {print $1}' | sort -u

On the other hand, if you want to do complex analysis of the graph in multiple directions (e.g., what set of manually installed packages has the largest on-disk overlapping set of recursive dependencies), it can quickly get out of hand. At that point, you'll probably need to look at something more customizable (awk or python?).

Full Disclosure: I have contributed to the project below. If that kind of thing matters to you, please take it into account. If I were aware of a similar project that were already in the Debian repositories, I would probably post that instead.

While I prefer to do everything from the command line, you might find pacgraph (also on github) a useful alternative. It was originally written by Kyle Keen for Arch Linux, but it's now compatible with deb- and rpm- based systems as well. I used to have some sample output from an Ubuntu system, but I can't find it, so here's an example from his web site:

Shiny!.

It's been a while since I've used it, but I believe there are also flags to highlight a particular package, with different colors for its recursive dependencies and reverse dependencies.

Related Question