I have a question concerning the find
command in Linux.
In all the articles I've found online it says that attribute -size -10M
, for example, returns files that are less than 10 MB in size. But when I tried to test this, it seems that -size -10M
returns files that are less than or equal 9 MB in size.
If I do
find . -type f -size -1M
the find
command returns only empty files (the unit is irrelevant, it can be -1G, -1k…).
find . -type f -size -2M
returns files <= 1M in size, etc.
The man page says:
Bear in mind that the size is rounded up to the next unit. Therefore -size -1M is not equivalent to -size -1048576c. The former only matches empty files, the latter matches files from 0 to 1,048,575 bytes.
Ok, so I guess -1M is rounded to 0M, -2M to -1M and so on… ?
But then
find . -type f -size 1M
returns files <= 1M (i.e. 100K and 512K files, but not empty files), while I would expect it to return files that are exactly 1M in size.
find . -type f -size 2M
returns files > 1M and <= 2M, etc.
Is this all normal or am I doing something wrong and what's the exact behavior of the -size
parameter?
Best Answer
The GNU find man page says as follows — and this appears specific to GNU find, other implementations may differ, see below:
Question:
No. It's not the limit in the
-size
condition that's rounded, but the file size itself.Take a file of 1234 bytes and a
-size -1M
directive. The file size is rounded up the nearest unit mentioned in the directive, here, MB's. 1234 -> 1 MB. That doesn't match the condition, since-size -1M
demands less than 1 MB (after this rounding). So, indeed,-size -1x
for anyx
, returns only empty files.Similarly,
-size 1M
would match the above file, since after rounding, it's exactly 1 MB in size. On the other hand,-size 1k
would not, since it rounds to 2 kB.Note that the
-
or+
in front of the number in the condition is irrelevant for the rounding behaviour.It may be useful to just always specify the sizes in bytes, since that way there's no rounding to stumble on.
-size -$((1024*1024))c
will reliably find files that are strictly less than 1 MB (or 1 MiB, if you will) in size. If you want a range, you can use e.g.( -size +$((512*1024-1))c -size -$((1024*1024+1))c )
for files within [512 kB, 1024 kB].Another question on this: Why does `find -size -1G` not find any files?
Gilles mentions in that linked question the fact that POSIX only specifies
-size N
as meaning size in 512-byte blocks (rounded as above: "the file size in bytes, divided by 512 and rounded up to the next integer"), and-size Nc
as meaning the size in bytes. Both with the optional plus or minus. The others are left unspecified, and not allfind
implementations recognize other prefixes, or round like GNU find does.I tested with Busybox and the *BSD find on my Mac, and it seems they treat conditions with size specifiers in a way that feels more sensible, i.e.
-size -1k
matches files from 0 to 1023 bytes, the same as-size -1024c
, and similarly for-size -1M
==-size -1024k
(Busybox only hasc
,b
andk
). Then again, Busybox doesn't seem to do the rounding even for sizes specified in blocks, against what the POSIX text seems to say it should.So, YMMV and again, maybe better to stick with sizes in bytes.
Note that there's a similar issue with the
-atime
,-mtime
and-ctime
conditions:And similarly, it may be easier to just use
-amin +$((24*60-1))
to find files that have been last accessed at least a full 24 h ago. (Up to rounding to a minute, which you can't get rid of.)See also: Why does find -mtime +1 only return files older than 2 days?
It's "normal" as far as the behaviour of GNU find is concerned, but I wouldn't call it exactly sensible. You're not wrong to be confused, it's
find
that is confusing.