I have a couple of files in a directory:
$ ls | wc -l
9376
Can anybody explain why there is such a huge time difference in using ls *
and ls
?
$ time ls > /dev/null
real 0m0.118s
user 0m0.106s
sys 0m0.011s
and
$ time ls * > /dev/null
real 1m32.602s
user 0m0.233s
sys 0m0.438s
okay, this is a drastic example and maybe enhanced because the directory is on a general parallel file system (GPFS). But I can also see a significant slowdown on a local file system.
EDIT:
$ time ls -l > /dev/null
real 0m58.772s
user 0m0.113s
sys 0m0.452s
$ time ls -l * > /dev/null
real 1m19.538s
user 0m0.252s
sys 0m0.461s
and I should add that in my example there are no sub directories:
$ diff <(ls) <(ls *)
$
Best Answer
When you run
ls
without arguments, it will just open a directory, read all the contents, sort them and print them out.When you run
ls *
, first the shell expands*
, which is effectively the same as what the simplels
did, builds an argument vector with all the files in the current directory and callsls
.ls
then has to process that argument vector and for each argument, and callsaccess(2)
¹ the file to check it's existence. Then it will print out the same output as the first (simple)ls
. Both the shell's processing of the large argument vector andls
's will likely involve a lot of memory allocation of small blocks, which can take some time. However, since there was littlesys
anduser
time, but a lot ofreal
time, most of the time would have been spent waiting for disk, rather than using CPU doing memory allocation.Each call to
access(2)
will need to read the file's inode to get the permission information. That means a lot more disk reads and seeks than simply reading a directory. I do not know how expensive these operations are on your GPFS, but as the comparison you've shown tols -l
which has a similar run time to the wildcard case, the time needed to retrieve the inode information appears to dominate. If GPFS has a slightly higher latency than your local filesystem on each read operation, we would expect it to be more pronounced in these cases.The difference between the wildcard case and
ls -l
of 50% could be explained by the ordering of inodes on the disk. If the inodes were laid out successively in the same order as the filenames in the directory andls -l
stat(2)ed the files in directory order before sorting,ls -l
would possibly read most of the inodes in a sweep. With the wildcard, the shell will sort the filenames before passing them tols
, sols
will likely read the inodes in a different order, adding more disk head movement.It should be noted that your
time
output will not include the time taken by the shell to expand the wildcard.If you really want to see what's going on, use
strace(1)
:and have a look which system calls are being performed in each case.
¹ I don't know if
access(2)
is actually used, or something else such asstat(2)
. But both probably require an inode lookup (I'm not sure ifaccess(file, 0)
would bypass an inode lookup.)