Find Files Not Installed by Package Manager in Gentoo

findgentoosymlink

I'd like to get a list of all files in my Gentoo Linux system that were not installed by the package manager (Portage). This is because I want to keep my system as clean as possible, removing all useless files lying around.

Let me tell you what I've tried until now. First of all, I generate the list of all files that belong to some package tracked by Portage:

equery files "*" | sort | uniq > portage.txt

Then I generate the list of all files on my system, except those that I don't care about:

find / \( -path /dev -o -path /proc -o -path /sys -o -path /media \
          -o -path /mnt -o -path /usr/portage -o -path /var/db/pkg \
          -o -path /var/www/localhost/htdocs -o -path /lib64/modules \
          -o -path /usr/src -o -path /var/cache -o -path /home \
          -o -path /root -o -path /run -o -path /var/run -o -path /var/tmp \
          -o -path /var/log -o -path /tmp -o -path /etc/config-archive \
          -o -path /usr/local/portage -o -path /boot \) -prune \
          -o -type f | sort | uniq > all.txt

Finally, I get the list of all files that are not tracked by Portage:

comm -13 portage.txt all.txt > extra.txt

Some statistics:

wc -l portage.txt all.txt extra.txt
  127724 portage.txt
   78371 all.txt
    8438 extra.txt

As you can see I still get more than eight thousands extra files. I'd like to reduce that number, in order to focus more on files that really need to be deleted.

I noticed that in extra.txt there are thousands of files in a small number of directories, such as /usr/lib64/gcc, /usr/lib64/python2.7 and /usr/lib64/python3.2. The /usr/lib64/gcc/x86_64-pc-linux-gnu/4.6.3/crtbegin.o file, for example, is not in portage.txt because, in its place, there is /usr/lib/gcc/x86_64-pc-linux-gnu/4.6.3/crtbegin.o. On my system /usr/lib is a symlink to /usr/lib64. So it seems that I need to properly handle symlinks to get better results. Perhaps by adding in portage.txt all files they point to. I don't really know how to do that.

Also, why portage.txt is bigger than all.txt? Shouldn't be the opposite since files tracked by Portage are a subset of all files in my system?

Finally, am I forgetting any other location in the find command that should be also excluded?

Best Answer

What you are looking for might be qfile. It is part of app-portage/portage-utils package and provides option -o or --orphans. You can use something like

find /usr/bin | xargs -I{} qfile -o {}

to get a list of orphaned files in /usr/bin.

Remark: Sadly, qfile in the current stable version of portage-utils, does not support readin from stdin, and the solution mentioned in the man page of qfile qfile -o $(find /usr/bin) does not work if the find result set is large, therefore we have to work around it a little bit, using xargs.

BTW, this is not something I myself came up with, but I found it at gossamer-threads, a comment by yvasilev.

Related Question