I have a folder containing a large amount of symlinked files. These files are each on the order of 10-11GB (fastq files to be specific). They come from a variety of source folders, but I made sure there's only one level of symlinks.
I'm trying to gzip them by simply doing:
gzip *.fastq
That results in a bunch of
too many levels of symbolic links
And thus fails.
However, when I do:
for i in `ls | egrep *.fastq$`; do gzip -c $i > $i.gz; done;
it does work. My question is simple. What is the difference between those? AFAIK, the only difference is that the second approach starts a new gzip process for each file, whereas the first one should do everything in one process. Can gzip only handle one symlinked file at a time? Doing the same on a test folder with normal files works both ways.
Best Answer
A quick check of the gzip source (specifically, gzip 1.6 as included in Ubuntu 14.04), shows that the observed behavior comes from the function open_and_stat, beginning at line 1037 of gzip.c:
Note that the comment line states that gzip will not follow symlinks unless it is called with the -c or -f flags, and inside the #if ... #endif the errno variable is set to ELOOP (too many symbolic links encountered) if the file to be compressed is actually a symlink.
Now, from the gzip(1) man page, the -c and -f flags are:
Putting all together and going back to the original question: