Linux – is `du` on WSL acting giving a directory size greater than machine memory

bashdumemorywindowswindows-subsystem-for-linux

I ran into my question while trying to find out which files on my computer are taking up the most space. Here's the information on the total machine memory, found from Windows Subsystem for Linux (WSL) / bash

bballdave025@WORK:~$ df -h /mnt/c
Filesystem      Size  Used Avail Use% Mounted on
C:              239G  231G  7.8G  97% /mnt/c

Note that my question is NOT about how to clear the space.

I started by checking the Program Files directory.

bballdave025@WORK:~$ du -sh /mnt/c/Program\ Files/
du: cannot read directory '/mnt/c/Program Files/Microsoft Policy Platform/authorityDb': Permission denied
du: cannot read directory '/mnt/c/Program Files/Microsoft SQL Server/130/Shared/ErrorDumps': Permission denied
du: cannot read directory '/mnt/c/Program Files/WindowsApps': Permission denied
2.5T    /mnt/c/Program Files/

The Main Issue

My WSL bash du is telling me that, on my machine (which has 239GB of memory,) my Program Files directory is taking up 2.5TB of the 239GB of available memory. It's like I'm holding two pints of water in my mouth without swallowing. (That's just to show the ratio of sizes — my problem doesn't involve water.)

By the way, I don't have admin rights — no sudo !! to solve any issues. I will leave out the Permission denied errors (that will come without a real sudo) as I continue to write this post. Also note that I'm on a work computer, so there are things I can't access.

Main question: Is there a relatively simple way to check disk usage in my situation, that is, to check disk usage on a Windows C: drive using Windows Subsystem for Linux?

Secondary Question: What the heck is going on here? Why am I getting a report that my Program Files directory is taking up 10 times more space than even exists on my machine?

By the way… Windows tells me that Program Files has a size of 4.83 GB, a fact I found by using File Explorer, right-clicking on the Program Files folder, and selecting 'Properties'


My Attempts at a Solution

My first thought was that there might be some symlinks or drive-mapping stuff for company coding software or an antivirus program or something, so I checked out the man page for du. I found the following two flags, which I thought might help.

-P, --no-dereference
              don't follow any symbolic links (this is the default)
-x, --one-file-system
              skip directories on different file systems

However, du -shP /mnt/c/Program\ Files/ , du -shx /mnt/c/Program\ Files/ , and even du -shPx /mnt/c/Program\ Files/ gave me 2.5T. For that matter, so did the option that should follow symlinks, du -shL. It output 2.5T. Same for the other, maybe-related options I've tried, du -shD and du -shH, gave the same — 2.5T for all of them.

My next thought was that perhaps Windows shortcuts were messing things up, so I tried excluded them. (I don't know if this code actually prevents following shortcuts, but I thought it worth a try.) No dice.

bballdave025@WORK:~$ du -sh --exclude=*.lnk /mnt/c/Program\ Files/
2.5T    /mnt/c/Program Files/

I could leave prejudices behind and try something from the <shudder> Windows Command Line </shudder> or even dust off my old PowerShell skills. I guess I could even bite the bullet and go to each directory in the File Explorer GUI, click each folder, select 'Properties', find which subdirectory takes the most space, enter the directory with the most memory usage and repeat clicking each folder … [sleeping] …

… However, I'm interested in why I'm getting this weird result. When I look at Program Files (x86), I get a result that's like stuffing a soccer ball (non-American) football in my mouth. (Once again, I'm talking in terms of the ratio of sizes; the volume of my mouth is not related to my problem.)

bballdave025@WORK:~$ du -sh /mnt/c/Program\ Files\ \(x86\)/
11T     /mnt/c/Program Files (x86)/

(Windows / File Explorer reported a size of 22.8 GB … after I'd waited 30 seconds.)

Sources and Attempts

From this Super User answer, I got the idea to try checking that my situation wasn't

The files you removed are probably still opened by a process.

bballdave025@WORK:~$ lsof -a +L1 /mnt/c/Program\ Files/
bballdave025@WORK:~$

Since there was no output, I'm assuming that no files I removed are still opened by a process.

I also looked at this question and answer about different du results on Linux and Cygwin. However, the discrepancies in size described in that question were minuscule, so I don't believe that the issue is similar. While I'm sure that

There is then no surprise for the same set of files to use differring [sic] disk size when stored on different file systems.

I do think that it's a surprise for the same set of files to use any differing disk size when they're really stored in one place, even if there are different underlying ways to access them.

Next steps

I decided to create a folder on my C: drive, put in a small file, and check to see that the file size was as expected.

bballdave025@WORK:~$ mkdir -p /mnt/c/Users/bballdave025/little_guy
bballdave025@WORK:~$ echo "This should make a small file." > /mnt/c/Users/bballdave025/little_guy/small_file.txt
bballdave025@WORK:~$ du -sh /mnt/c/Users/bballdave025/little_guy/small_file.txt
17K     /mnt/c/Users/bballdave025/little_guy/small_file.txt
bballdave025@WORK:~$ du -shPx /mnt/c/Users/bballdave025/little_guy/
17K     /mnt/c/Users/bballdve025/little_guy/

17KB does seem big for that little-bitty text file. If we have a byte per charater, that would give us 31 bytes. I don't know if that exercise — making a text file and checking du — will help to answer the question, but it's been part of my effort.

I'm stuck. I really don't want to click through folders. I also want to know why I get this weird behavior. Any ideas?


System Details

bballdave025@WORK:~$ uname -a | head -n 1
Linux WORK 4.4.0-43-Microsoft #1-Microsoft Wed Dec 31 14:42:53 PST 2014 x86_64 x86_64 x86_64 GNU/Linux
bballdave025@WORK:~$ bash --version | head -n 1
GNU bash, version 4.3.46(1)-release (x86_64-pc-linux-gnu)
bballdave025@WORK:~$ systeminfo.exe | sed -n 's/^OS\ *//p'
Unable to translate current working directory. Using C:\Windows\System32
Name:                   Microsoft Windows 10 Enterprise
Version:                10.0.15063 N/A Build 15063
Manufacturer:           Microsoft Corporation
Configuration:          Member Workstation
Build Type:             Multiprocessor Free

Best Answer

Reproduction

I just tried the same command as you: du -sh /mnt/c/Program\ Files/ and mine reported properly with what Windows reported.

It's possible it was a bug and has been patched, or that there's something about your file system that I do not have going on with mine. You've already done a dig on linking/shortcuts, but maybe there's still something being overlooked there?

I did double check against Bash on Ubuntu on Windows "WSL Legacy" and Ubuntu, both reported the same for me.

Just saw the comments on the question about a reported bug, looks like everything mentioned has been patched up ?

Additional Steps to Try

You probably no longer have this issue occurring, given this was asked over a year ago now. Here's some additional steps I would try, to pinpoint where that large number is coming from.

Install NCDU

I would recommend trying ncdu. You can install it with the following on Ubuntu/WSL[Ubuntu Flavor]:

sudo apt install ncdu

This will crawl your system and visually show you where space is going. This may help you pinpoint what/where the disk is supposedly being used in that program file mount. I would be really interested to see if this shows the same issue or not. I assume ncdu uses du so I would think it would display the same for you unless it uses some flags behind the scene to avoid this.

Display Usages Only for Program Files Directory

Using ncdu to crawl only a specific directory is pretty straight forward. You can display usage only for the Program Files directory on windows using the following command:

ncdu /mnt/c/Program\ Files

Resolution

I would recommend that you use Windows to determine disk usage for the Windows Operating System, especially given that the file system is undoubtedly NTFS.

If you want to determine the disk usage just in the WSL instance I would recommend using ncdu and ignoring the /mnt directory so you only display usage for the Linux system and not the Windows mount.

Don't get me wrong though, my interests are equally piqued about what's going on with your situation.

Check Linux Disk Space Ignoring Windows Mount

To check the Linux disk usage ignoring the windows mount you can run:

ncdu --exclude /mnt

Why Small Files Take Up More Data

If I recall correctly, even if you only throw a couple of characters into a text file, you're still occupying the sector on the drive. Double checking I was not able to reproduce this on NTFS drive systems, but I was able to do this on FAT32. NTFS is used for Windows so it's possible the reporting through Linux is displaying through Linux's interpretation of the filesystem that it's working with.

It used to be that some apps would make thousands of small files, and it was like death by a million paper cuts. Also transferring thousands of small files would take much longer than a single large contiguous file.

Note that you can see its actual size and the size it occupies on the disk.

I doubt this is the reason you're seeing a large discrepancy in your disk reporting though, but it could be interesting if you had millions of small files. Some caching/storage schemes do tend to branch out into many small files for quick binary search access.

file size on FAT32 disk

Related Question