Ubuntu – Concatenate multiple files without header

bashcommand linescripts

I have several directories ("amazon", "niger",…), in which I have several subdirectories ("gfdl", "hadgem",…), in which I also have several sub-directories ("rcp8p5", "rcp4p5",…). In this last subdirectories I always have two folders ("historical", "projected") that contain thousand of tables having the same frame.
Therefore, I would like to concatenate those tables (present in the two folders of the last sub-directories) in order to have just one big table with only one header and not an header each time that a table has been concatenate. Does anyone knows how to do that?

I am currently using the following loop structure:

#!/bin/bash
# usage:cat_dat dirname

data_dir=/scratch/01/stevens/climate_scenario/river

for river in tagus
  do
   for gcm in gfdl-esm2m hadgem2-es
     do
      for scenario in rcp8p5 rcp4p5 rcp6p0 rcp2p6
        do
          find "${data_dir}/${river}/${gcm}/${scenario}" name \*.dat -exec cat {} + >> "${data_dir}/${river}/${gcm}/${scenario}.dat"
      done
   done
done

but I canĀ“t get rid of the header with that! Any helps is greatly appreciated! Thanks!

Best Answer

Using awk in a single folder

awk 'NR==1 {header=$_} FNR==1 && NR!=1 { $_ ~ $header getline; } {print}' *.dat > out

find and awk if you need all files in the current folder and in the subfolders. You can replace . with your desired folder.

find . -type f -name "*.dat" -print0 | \
    xargs -0 awk 'NR==1 {header=$_} FNR==1 && NR!=1 { $_ ~ $header getline; } {print}' > out

or, as getline is bad (thx @fedorqui)

find . -type f -name "*.dat" -exec awk 'NR==1 || FNR!=1' {} + ;

Example

% cat foo1.dat 
a   b   c
1   2   3

% cat foo2.dat
a   b   c
4   5   6

% awk 'NR==1 {header=$_} FNR==1 && NR!=1 { $_ ~ $header getline; } {print}' *.dat > out

% cat out 
a   b   c
1   2   3
4   5   6
Related Question