text-processing csv – How to Cut Every 100th Column from CSV

csvtext processing

I have a data file of numbers seperated by tabs, like this

1 2 3 4
2 4 6 8

My real file is 50000 columns wide and I only need every 100th column (column 100, 200, 300, 400, …).
Now I would like to remove all the other columns.

How can I do that?

Best Answer

That's what awk is for:

awk '{for(i=100;i<=NF;i+=100){printf "%s ",$i;} print ""}' file > output

Or, if you can have spaces inside your fields, specify tab as the field separator:

awk -F'\t' '{for(i=100;i<=NF;i+=100){printf "%s ",$i;} print ""}' file > output

Alternatively, you could use Perl:

perl -ane 'for($i=99;$i<=$#F;$i+=100){print "$F[$i] "}' file > output

To do this for multiple files, you can use a shell loop (assuming you want to run this on all files in the current directory):

for f in *; do
  awk '{for(i=100;i<=NF;i+=100){printf "%s ",$i;} print ""}' "$f" > "$f".new;
done

Related Solutions

How to merge first two lines of a csv column-by-column

Try this

$ awk -F, 'NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$

Same code is more readable if split across a few lines :

$ awk -F, '
> NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}
> NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}
> NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$

If 1st line, split the line into array elements within a. Fix the Citty->City typo.

If 2nd line, starting with the 2nd column, print the corresponding column from 1st line together with this column. Repeat for each column, going in 2 column increments. Strip the trailing ,.

After 2nd line, replace any leading , or any "<spaces>", with an empty string and then print the result.

Tested ok on GNU Awk 4.0.2

Try it online!

Add Columns to .csv from Multiple Files

Maybe you could try to use paste and one additional temp file

touch temp

while read i; do
        awk '{print $4}' ${i}.txt > ${i}_temp
        paste temp ${i}_temp > test.csv
        cp test.csv temp
done < file_list

rm temp

Best Answer

Related Solutions

How to merge first two lines of a csv column-by-column

Add Columns to .csv from Multiple Files

Related Question