awk find mv – Move Files Based on Column Value Threshold

awkexecfindmv

I have a large number of files in a particular folder. I would like to move these files to a subfolder ONLY IF they have at least 1 value above 0.5 in any row of column 4. In a separate command I'd like to do the same but with files having at least 2 rows with values above 0.5 in column 4.

This is the general format of the files (with header):

col1  col2  col3  col4  col5  col6
ABC   DEF   5.10  0.94  GHI   JKL
MNO   PQR   8.31  0.37  STU   VWX
ABC   DEF   6.49  0.84  GHI   JKL
MNO   PQR   3.32  0.21  STU   VWX

Some of the numbers in column 4 are in scientific notation:
8.934553871039306e-05

The code below is what I have tried so far to move files with at least 1 value above 0.5 in column 4. It ends up moving every file into the subfolder, even ones that don't match the condition.

#!/bin/bash

find . -type f -exec awk '$4 >= 0.5' {} \; -exec mv -n {} ./NewFolder/ \;

Best Answer

To make your command work you have to make awk exit with code 0 if a match is found or with a non-zero exit code if no match is found.

In addition to this you should skip the first line because a non-numeric value will be compared as a string which may lead to an unexpected match.

find . -type f -exec awk 'FNR==1 {next} $4 >= 0.5 {found=1; exit} END {exit !found}' {} \; -exec mv -n {} ./NewFolder/ \;

Note: If the awk script is called with more than one file the exit code means that a match was found in any of the files. The find command will make sure only one file at a time is passed to awk, so this is not a problem here.

2nd Edit:

To select files that have at least 2 matching rows you can count the matches.

find . -type f -exec awk 'FNR==1 {next} $4 >= 0.5 {found++; if(found >= 2) exit} END {exit found >= 2}' {} \; -exec mv -n {} ./NewFolder/ \;

Edit:

To debug the problem that the script moves files that don't have a matching value in column 4, you can add code to the awk script to print information about the matching line. The following code will print the file name, line number and the matching line if a match was found.

find . -type f -exec awk 'FNR==1 {next} $4 >= 0.5 {found=1; printf "%s:%d:%s\n", FILENAME, FNR, $0; exit} END {exit !found}' {} \; -exec mv -n {} ./NewFolder/ \;

You will get something like

threshold.txt:2:ABC   DEF   5.10  0.94  GHI   JKL

I suggest to do this first to find the cause of the problem.

If there are lines that have a non-numeric text in column 4, the values will be compared as text. This would result in e.g. "abc" being greater than "0.5".

Another possible cause could be a line that has spaces in column 1 or 2 which will lead to a wrong assignment of the text to the columns.

In case there are non-numeric values in column 4 and you want to ignore these lines, you can force a numeric interpretation by adding the value to 0 as in 0 + $4.

find . -type f -exec awk 'FNR==1 {next} 0 + $4 >= 0.5 {found=1; printf "%s:%d:%s\n", FILENAME, FNR, $0; exit} END {exit !found}' {} \; -exec mv -n {} ./NewFolder/ \;

If the reason for the problem is that your fields are separated by a tab and that values can contain spaces you can specify the field separator (-F "\t"). The following script combines this with the other modifications.

find . -type f -exec awk -F "\t" 'FNR==1 {next} 0 + $4 >= 0.5 {found=1; printf "%s:%d:%s\n", FILENAME, FNR, $0; exit} END {exit !found}' {} \; -exec mv -n {} ./NewFolder/ \;
Related Question