Bash – How to launch two threads in bash shell script

bashlinuxmultithreadingscpshell

I am trying to copy files from machineB and machineC into machineA as I am running my below shell script on machineA.

If the files is not there in machineB then it should be there in machineC for sure so I will try copying the files from machineB first, if it is not there in machineB then I will try copying the same files from machineC.

I am copying the files in parallel using GNU Parallel library and it is working fine. Currently I am copying two files in parallel.

Currently, I am copying the PRIMARY_PARTITION files in PRIMARY folder using GNU parallel and once that is done, then I copy the SECONDARY_PARTITION files in SECONDARY folder using same GNU parallel so it is sequential as of now w.r.t PRIMARY and SECONDARY folder

Below is my shell script and everything works fine –

#!/bin/bash

export PRIMARY=/test01/primary
export SECONDARY=/test02/secondary
readonly FILERS_LOCATION=(machineB machineC)
export FILERS_LOCATION_1=${FILERS_LOCATION[0]}
export FILERS_LOCATION_2=${FILERS_LOCATION[1]}
PRIMARY_PARTITION=(550 274 2 546 278) # this will have more file numbers
SECONDARY_PARTITION=(1643 1103 1372 1096 1369) # this will have more file numbers

export dir3=/testing/snapshot/20140103

# delete primary files first and then copy
find "$PRIMARY" -mindepth 1 -delete

do_CopyInPrimary() {
  el=$1
  scp david@$FILERS_LOCATION_1:$dir3/new_weekly_2014_"$el"_200003_5.data $PRIMARY/. || scp david@$FILERS_LOCATION_2:$dir3/new_weekly_2014_"$el"_200003_5.data $PRIMARY/.
}
export -f do_CopyInPrimary
parallel -j 2 do_CopyInPrimary ::: "${PRIMARY_PARTITION[@]}"

# delete secondary files first and then copy
find "$SECONDARY" -mindepth 1 -delete

do_CopyInSecondary() {
  el=$1
  scp david@$FILERS_LOCATION_1:$dir3/new_weekly_2014_"$el"_200003_5.data $SECONDARY/. || scp david@$FILERS_LOCATION_2:$dir3/new_weekly_2014_"$el"_200003_5.data $SECONDARY/.
}
export -f do_CopyInSecondary
parallel -j 2 do_CopyInSecondary ::: "${SECONDARY_PARTITION[@]}"

Problem Statement:-

Is there any way I can launch two threads, one to copy files in PRIMARY folder using the same setup as I have above, meaning it will copy two files in parallel. And second thread to copy the files in SECONDARY folder using the same setup as I have above, it should also copy two files parallel simultaneously?

Meaning they should copy files in parallel both in PRIMARY and SECONDARY folder simultaneously not once PRIMARY folder is done, then copy files in SECONDARY folder.

Currently, once PRIMARY folder file is done, then only I try copying the files in SECONDARY folder.

In short, I just need to launch two threads one thread will run this –

# delete primary files first and then copy
find "$PRIMARY" -mindepth 1 -delete

do_CopyInPrimary() {
  el=$1
  scp david@$FILERS_LOCATION_1:$dir3/new_weekly_2014_"$el"_200003_5.data $PRIMARY/. || scp david@$FILERS_LOCATION_2:$dir3/new_weekly_2014_"$el"_200003_5.data $PRIMARY/.
}
export -f do_CopyInPrimary
parallel -j 2 do_CopyInPrimary ::: "${PRIMARY_PARTITION[@]}"

And second thread will run this –

# delete secondary files first and then copy
find "$SECONDARY" -mindepth 1 -delete

do_CopyInSecondary() {
  el=$1
  scp david@$FILERS_LOCATION_1:$dir3/new_weekly_2014_"$el"_200003_5.data $SECONDARY/. || scp david@$FILERS_LOCATION_2:$dir3/new_weekly_2014_"$el"_200003_5.data $SECONDARY/.
}
export -f do_CopyInSecondary
parallel -j 2 do_CopyInSecondary ::: "${SECONDARY_PARTITION[@]}"

And once all the files are copied successfully, it should echo the message, that all the files are copied. In java, I know how to launch two threads and each thread is performing certain task but not sure how in bash shell script this will work?

My main task is to copy two files in parallel using GNU parallel in PRIMARY folder and SECONDARY folder at a same time?

Is this possible to do in bash shell script?

Best Answer

The obvious is:

parallel -j 2 do_CopyInPrimary ::: "${PRIMARY_PARTITION[@]}" &
parallel -j 2 do_CopyInSecondary ::: "${SECONDARY_PARTITION[@]}" &
wait

But this way the secondary does not wait for the primary to finish and it does not check if the primary was successful. Let us assume that $PRIMARY_PARTITION[1] corresponds to $SECONDARY_PARTITION[1] (so if you cannot read the file from $PRIMARY_PARTITION[1] you will read it from $SECONDARY_PARTITION[1] - That also means that $PRIMARY_PARTITION and $SECONDARY_PARTITION have the same number of elements). Then you can condition the running of $SECONDARY_PARTITION[1] on $PRIMARY_PARTITION[1].

do_Copy() {
  PRIMARY_PARTITION=(550 274 2 546 278) # this will have more file numbers
  SECONDARY_PARTITION=(1643 1103 1372 1096 1369) # this will have more file numbers
  pel=${PRIMARY_PARTITION[$1]}
  sel=${SECONDARY_PARTITION[$1]}
  do_CopyInPrimary $pel || 
    do_CopyInSecondary $sel || 
    echo Could not copy neither $pel nor $sel
}
export -f do_Copy
# Number of elements in PRIMARY_PARTITION == SECONDARY_PARTITION
seq ${#PRIMARY_PARTITION[@]} | parallel -j 2 do_Copy

This will get the dependency right, but it will only copy 2 at a time in total. With -j4 you risk running 4 primaries at the same time, so we need to guard against that, too:

do_Copy() {
  PRIMARY_PARTITION=(550 274 2 546 278) # this will have more file numbers
  SECONDARY_PARTITION=(1643 1103 1372 1096 1369) # this will have more file numbers
  pel=${PRIMARY_PARTITION[$1]}
  sel=${SECONDARY_PARTITION[$1]}
  sem -j2 --fg --id primary do_CopyInPrimary $pel || 
    sem -j2 --fg --id secondary do_CopyInSecondary $sel || 
    echo Could not copy neither $pel nor $sel
}
export -f do_Copy
# Number of elements in PRIMARY_PARTITION == SECONDARY_PARTITION
seq ${#PRIMARY_PARTITION[@]} | parallel -j 4 do_Copy

sem will limit the number of primaries to 2 and the number of secondaries to 2.

Related Question