Your code overwrites the output file in each iteration. You also do not actually call awk
.
What you want to do is something like
awk '$5 >= 0.5' ./*.imputed.*_info >snplist.txt
This would call awk
with all your files at once, and it would go through them one by one, in the order that the shell expands the globbing pattern. If the 5th column of any line in a file is greater or equal to 0.5, that line would be outputted (into snplist.txt
). This works since the default action, if no action ({...}
block) is associated with a condition, is to output the current line.
In cases where you have a large number of files (many thousands), this may generate an "Argument list too long" error. In that case, you may want to loop:
for filename in ./*.imputed.*_info; do
awk '$5 >= 0.5' "$filename"
done >snplist.txt
Note that the result of awk
does not need to be stored in a variable. Here, it's just outputted and the loop (and therefore all commands inside the loop) is redirected into snplist.txt
.
For many thousands of files, this would be quite slow since awk
would need to be invoked for each of them individually.
To speed things up, in the cases where you have too many files for a single invocation of awk
, you may consider using xargs
like so:
printf '%s\0' ./*.imputed.*_info | xargs -0 awk '$5 >= 0.5' >snplist.txt
This would create a list of filenames with printf
and pass them off to xargs
as a nul-terminated list. The xargs
utility would take these and start awk
with as many of them as possible at once, in batches. The output of the whole pipeline would be redirected to snplist.txt
.
This xargs
alternative is assuming that you are using a Unix, like Linux, which has an xargs
command that implements the non-standard -0
option to read nul-terminated input. It also assumes that you are using a shell, like bash
, that has a built-in printf
utility (ksh
, the default shell on OpenBSD, would not work here as it has no such built-in utility).
For the zsh
shell (i.e. not bash
):
autoload -U zargs
zargs -- ./*.imputed.*_info -- awk '$5 >= 0.5' >snplist.txt
This uses zargs
, which is basically a reimplementation of xargs
as a loadable zsh
shell function. See zargs --help
(after loading the function) and the zshcontrib(1)
manual for further information about that.
Best Answer
You have two bugs:
You are comparing for a size that contains
46
; you want it to be equal to 46.You are printing the entire line, when you want only the filename.
And an additional issue: what is the point of
-ltr
to sort thels
output when you aren't using the sort order?You want to do something like
Except you don't want to do that, because while it might be safe at the moment, parsing
ls
output is unreliable. Use an appropriate tool such as(Doing this portably is more annoying, since POSIX
find
doesn't have-maxdepth
or-size
that operates in units other than blocks. Better to write a script in a Perl/Python/Ruby/etc. that can use a proper directory scan that won't get in trouble with special characters in filenames.)