Delete a row that contains 0 more than ‘x’ amount of times

awkgrepsedtext processing

I have a large comma separated file. I need to filter out rows that contain x amount of columns containing zeroes (excluding the first row). For simplicity, let's say I want to filter out rows with more than 4 zeroes:

    gene,v1,v2,v3,v4,v5,v6,v7
    gene1,0,1,5,0,0,4,100
    gene2,1,0,0,0,5,210,2
    gene3,0,0,0,0,6,0,0

Would return:

    gene,v1,v2,v3,v4,v5,v6,v7
    gene1,0,1,5,0,0,4,100
    gene2,1,0,0,0,5,210,2

Filtering out "gene3".

Here's what I've tried (attempting and failing to use ',0' as a delimiter):

awk -F',0' 'NF<4 {print}' file.csv

Best Answer

KISS approach, with awk

awk -F, '{c = 0; for(i=1; i<=NF; i++) {c += $i == "0" ? 1 : 0}} c <= 3' file.csv
    gene,v1,v2,v3,v4,v5,v6,v7
    gene1,0,1,5,0,0,4,100
    gene2,1,0,0,0,5,210,2

With perl

perl -F, -ne 'print unless (grep { $_ eq "0" } @F) > 3' file.csv
    gene,v1,v2,v3,v4,v5,v6,v7
    gene1,0,1,5,0,0,4,100
    gene2,1,0,0,0,5,210,2
Related Question