I have a large number of files in a particular folder. I would like to move these files to a subfolder ONLY IF they have at least 1 value above 0.5 in any row of column 4. In a separate command I'd like to do the same but with files having at least 2 rows with values above 0.5 in column 4.
This is the general format of the files (with header):
col1 col2 col3 col4 col5 col6
ABC DEF 5.10 0.94 GHI JKL
MNO PQR 8.31 0.37 STU VWX
ABC DEF 6.49 0.84 GHI JKL
MNO PQR 3.32 0.21 STU VWX
Some of the numbers in column 4 are in scientific notation:
8.934553871039306e-05
The code below is what I have tried so far to move files with at least 1 value above 0.5 in column 4. It ends up moving every file into the subfolder, even ones that don't match the condition.
#!/bin/bash
find . -type f -exec awk '$4 >= 0.5' {} \; -exec mv -n {} ./NewFolder/ \;
Best Answer
To make your command work you have to make
awk
exit with code 0 if a match is found or with a non-zero exit code if no match is found.In addition to this you should skip the first line because a non-numeric value will be compared as a string which may lead to an unexpected match.
Note: If the
awk
script is called with more than one file the exit code means that a match was found in any of the files. Thefind
command will make sure only one file at a time is passed toawk
, so this is not a problem here.2nd Edit:
To select files that have at least 2 matching rows you can count the matches.
Edit:
To debug the problem that the script moves files that don't have a matching value in column 4, you can add code to the
awk
script to print information about the matching line. The following code will print the file name, line number and the matching line if a match was found.You will get something like
I suggest to do this first to find the cause of the problem.
If there are lines that have a non-numeric text in column 4, the values will be compared as text. This would result in e.g.
"abc"
being greater than"0.5"
.Another possible cause could be a line that has spaces in column 1 or 2 which will lead to a wrong assignment of the text to the columns.
In case there are non-numeric values in column 4 and you want to ignore these lines, you can force a numeric interpretation by adding the value to
0
as in0 + $4
.If the reason for the problem is that your fields are separated by a tab and that values can contain spaces you can specify the field separator (
-F "\t"
). The following script combines this with the other modifications.