Bash – Printing Multiple Values from a CSV File Using Bash Script

bashshell-script

So my goal is to print multiple values from a .csv file.
I'm trying to find a way to do it asap, with the lowest possible running script time.

For example, I have a file called "test.csv".
Inside "test.csv" I have the following values :

0,1673466134,875601111928832,3336977422,22610058C2740,2020-06-03,19:00:01,103,456123489478512
0,6987507655,226102200333225,2312147777,226102E1858F0,2020-06-02,19:00:04,102,112323548998726
0,7891328975,250423212127644,7421354899,22610058C5350,2020-06-01,19:00:00,103,123123489784238
1,1324654889,784502311776287,4778994563,22610058C351E,2020-06-09,19:00:01,102,489123478941324
0,1231324474,247122410577385,1232498779,22610058C53A0,2020-06-07,19:00:00,104,123498715234789
1,4471222598,226912478523771,4123487987,226102C242C40,2020-06-04,19:00:00,103,789123418971354

And I need to print the following values :

ex : Count all the values from the first column that are "1"
I would do it like this :

cat test1.csv | awk -F ','  '{print $1}' | awk '/^1/' | wc -l

ex : Sum up all the values from the 8th column where the 1st column = 1

cat test1.csv | awk -F ','  '{print $1,$8}' | awk '/^1/' | awk '{sum+=$2} END {print sum}'

And the list goes on. I have about 11 commands to run like the ones from above.
My goal is to include all these commands in a script file, and have them executed as quickly as possible.

I made a script that looks like this :

#!/bin/bash
while IFS=, read col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9
do
        echo "No of lines containing 0 on the 1st column: "
           awk -F ','  '{print $1}' | awk '/^0/' | wc -l
        echo "No of lines containing 1 on the 1st column:"
           awk -F ','  '{print $1}' | awk '/^1/' | wc -l
done < test.csv

The problem I have is that after the 1st command is executed, the second one displays "0" no matter what I'm doing.

Can someone help me with this issue?
Thank you!

Best Answer

OK, first of all, you don't want to do this. Awk is orders of magnitude faster than the shell, so there is no benefit whatsoever in converting an awk script into a shell script! Forget the shell, just do everything in awk. Save this file as foo.awk:

#!/bin/awk -f
BEGIN{
  FS=","
}
{
  if($1~/^0/){zeros++}
  if($1~/^1/){ones++}
}
END{
  printf "No of lines containing 0 on the 1st column: %d\n", zeros;
  printf "No of lines containing 1 on the 1st column: %d\n", ones;
}

Make the file executable with chmod a+x foo.awk and then run it:

/path/to/foo.awk /path/to/test.csv

If I run it on your example data, I get:

$ foo.awk test.csv 
No of lines containing 0 on the 1st column: 4
No of lines containing 1 on the 1st column: 2

To include the command in your second example, do:

#!/bin/awk -f
BEGIN{
  FS=","
}
{
  if($1~/^0/){zeros++}
  if($1~/^1/){ones++; sum8+=$8}
}
END{
  printf "No of lines containing 0 on the 1st column: %d\n", zeros;
  printf "No of lines containing 1 on the 1st column: %d\n", ones;
  printf "Sum of all 8th fields where the 1st field starts with 1: %d\n", sum8
}

If you must use a shell script for some reason, then have the shell script run the awk and nothing else. Do not try to split the input in the shell, that's complicated and very slow. Something like this is much better:

#!/bin/bash
awk -F"," '($1~/^0/){zeros++}
           ($1~/^1/){ones++}
           END{ 
                printf "No of lines containing 0 on the 1st column: %d\n", zeros;
                printf "No of lines containing 1 on the 1st column: %d\n", ones;
           }' "$1"

Finally, if you really want to keep this as separate commands, you could do something like this but it will be very slow since it needs to read the file multiple times:

#!/bin/bash

echo "No of lines containing 0 on the 1st column: "
awk -F ','  '{print $1}' "$1" | awk '/^0/' | wc -l
echo "No of lines containing 1 on the 1st column:"
awk -F ','  '{print $1}' "$1" | awk '/^1/' | wc -l
echo "Sum of all the 8th columns where the 1st column starts with 1:"
awk -F ','  '/^1/{sum+=$8} END {print sum}' "$1"

You would then make the file executable (chmod a+x /path/to/foo.sh) and run it like this:

/path/to/foo.sh /path/to/test.csv
Related Question