Find Command – Understanding -size Behavior

findsize;

I have a question concerning the find command in Linux.

In all the articles I've found online it says that attribute -size -10M, for example, returns files that are less than 10 MB in size. But when I tried to test this, it seems that -size -10M returns files that are less than or equal 9 MB in size.

If I do

find . -type f -size -1M

the find command returns only empty files (the unit is irrelevant, it can be -1G, -1k…).

find . -type f -size -2M

returns files <= 1M in size, etc.

The man page says:

Bear in mind that the size is rounded up to the next unit. Therefore -size -1M is not equivalent to -size -1048576c. The former only matches empty files, the latter matches files from 0 to 1,048,575 bytes.

Ok, so I guess -1M is rounded to 0M, -2M to -1M and so on… ?

But then

find . -type f -size 1M

returns files <= 1M (i.e. 100K and 512K files, but not empty files), while I would expect it to return files that are exactly 1M in size.

find . -type f -size 2M

returns files > 1M and <= 2M, etc.

Is this all normal or am I doing something wrong and what's the exact behavior of the -size parameter?

Best Answer

The GNU find man page says as follows — and this appears specific to GNU find, other implementations may differ, see below:

The + and - prefixes signify greater than and less than, as usual; i.e., an exact size of n units does not match. Bear in mind that the size is rounded up to the next unit. Therefore -size -1M is not equivalent to -size -1048576c. The former only matches empty files, the latter matches files from 0 to 1,048,575 bytes.

Question:

Ok, so I guess -1M is rounded to 0M, -2M to -1M and so on... ?

No. It's not the limit in the -size condition that's rounded, but the file size itself.

Take a file of 1234 bytes and a -size -1M directive. The file size is rounded up the nearest unit mentioned in the directive, here, MB's. 1234 -> 1 MB. That doesn't match the condition, since -size -1M demands less than 1 MB (after this rounding). So, indeed, -size -1x for any x, returns only empty files.

Similarly, -size 1M would match the above file, since after rounding, it's exactly 1 MB in size. On the other hand, -size 1k would not, since it rounds to 2 kB.

Note that the - or + in front of the number in the condition is irrelevant for the rounding behaviour.

It may be useful to just always specify the sizes in bytes, since that way there's no rounding to stumble on. -size -$((1024*1024))c will reliably find files that are strictly less than 1 MB (or 1 MiB, if you will) in size. If you want a range, you can use e.g. ( -size +$((512*1024-1))c -size -$((1024*1024+1))c ) for files within [512 kB, 1024 kB].

Another question on this: Why does `find -size -1G` not find any files?


Gilles mentions in that linked question the fact that POSIX only specifies -size N as meaning size in 512-byte blocks (rounded as above: "the file size in bytes, divided by 512 and rounded up to the next integer"), and -size Nc as meaning the size in bytes. Both with the optional plus or minus. The others are left unspecified, and not all find implementations recognize other prefixes, or round like GNU find does.

I tested with Busybox and the *BSD find on my Mac, and it seems they treat conditions with size specifiers in a way that feels more sensible, i.e. -size -1k matches files from 0 to 1023 bytes, the same as -size -1024c, and similarly for -size -1M == -size -1024k (Busybox only has c, b and k). Then again, Busybox doesn't seem to do the rounding even for sizes specified in blocks, against what the POSIX text seems to say it should.

So, YMMV and again, maybe better to stick with sizes in bytes.


Note that there's a similar issue with the -atime, -mtime and -ctime conditions:

-atime n
File was last accessed n*24 hours ago. When find figures out how many 24-hour periods ago the file was last accessed, any fractional part is ignored, so to match -atime +1, a file has to have been accessed at least two days ago.

And similarly, it may be easier to just use -amin +$((24*60-1)) to find files that have been last accessed at least a full 24 h ago. (Up to rounding to a minute, which you can't get rid of.)

See also: Why does find -mtime +1 only return files older than 2 days?


Is this all normal or am I doing something wrong and what's the exact behavior of the -size parameter?

It's "normal" as far as the behaviour of GNU find is concerned, but I wouldn't call it exactly sensible. You're not wrong to be confused, it's find that is confusing.