Ubuntu – For loop with file names

bashcommand linefiles

I have several files (tables) named as: institute_model_scenario_river.txt

(institute, model, scenario, and river are variables.) I would like to create a for loop that will identify each files that have the same institute name and at the same time the same scenario name, in order to append the results of each different model in the same output file, using the following command:

paste filename1.txt filename2.txt > output_file.txt

I know how to create a for loop over different folder but not over file names. Anyone has an idea?

As a minimal example, the filenames could be as follow:

wbm_gfdl_rcp8p5_mississippi.txt
wbm_hadgem_rcp8p5_mississippi.txt
matsiro_gfdl_rcp8p5_mississippi.txt
matsiro_ipsl_rcp4p5_mississippi.txt
matsiro_hadgem_rcp4p5_mississippi.txt
matsiro_miroc_rcp8p5_mississippi.txt

Then, I would like to append the following files together:

wbm_gfdl_rcp8p5_mississippi.txt with
wbm_hadgem_rcp8p5_mississippi.txt

matsiro_ipsl_rcp4p5_mississippi.txt with
matsiro_hadgem_rcp4p5_mississippi.txt

matsiro_gfdl_rcp8p5_mississippi.txt with
matsiro_miroc_rcp8p5_mississippi.txt

Best Answer

If the files are all in the same dir, you can:

ls |
awk -F_ '{ i=$1; m=$2; s=$3; f[i"_"s] = f[i"_"s] " " $0 }
         END{ for(insc in f)
                printf "paste%s >out_%s.txt\n",f[insc],insc
         }'

which splits the filename on "_" (-F_), sets the variables i,m,s to the first 3 parts of the filename (institute,model,scenario), and accumulates in array f the filename. The array is indexed only by the institute and scenario, so all the models are concatenated (m isn't used). The final END prints the f array, and uses the index (institute_scenario) as the name for the output file. With your examples this produces

paste wbm_gfdl_rcp8p5_mississippi.txt wbm_hadgem_rcp8p5_mississippi.txt >out_wbm_rcp8p5.txt
paste matsiro_hadgem_rcp4p5_mississippi.txt matsiro_ipsl_rcp4p5_mississippi.txt >out_matsiro_rcp4p5.txt
paste matsiro_gfdl_rcp8p5_mississippi.txt matsiro_miroc_rcp8p5_mississippi.txt >out_matsiro_rcp8p5.txt

You then need to pipe this into the shell to have it executed. Add | sh to the last line above to do this.

To remove some columns from the input files, you need to alter the awk line that is collecting all the input filenames. In the 1st awk line:

{ i=$1; m=$2; s=$3; f[i"_"s] = f[i"_"s] " " $0 }

the filename is the "$0". For example, if you change this line into:

{ i=$1; m=$2; s=$3; f[i"_"s] = f[i"_"s] sprintf(" <(cut -f4 %s)",$0) }

then you will get the example output:

paste <(cut -f4 wbm_gfdl_rcp8p5_mississippi.txt) <(cut -f4 wbm_hadgem_rcp8p5_mississippi.txt) >out_wbm_rcp8p5.txt

but if you want to cut only the 2nd filename, it is a bit more complicated and you need this instead:

{ i=$1; m=$2; s=$3; 
  if(f[i"_"s]=="")add = $0; else add = sprintf("<(cut -f4 %s)",$0);
  f[i"_"s] = f[i"_"s] " " add }

so you will get

paste wbm_gfdl_rcp8p5_mississippi.txt <(cut -f4 wbm_hadgem_rcp8p5_mississippi.txt) >out_wbm_rcp8p5.txt

If sh does not understand the syntax <(cut ...) then replace it by bash.

Related Solutions

Ubuntu – Search for duplicate file names within folder hierarchy

FSlint is a versatile duplicate finder that includes a function for finding duplicate names:

FSlint

The FSlint package for Ubuntu emphasizes the graphical interface, but as is explained in the FSlint FAQ a command-line interface is available via the programs in /usr/share/fslint/fslint/. Use the --help option for documentation, e.g.:

$ /usr/share/fslint/fslint/fslint --help
File system lint.
A collection of utilities to find lint on a filesystem.
To get more info on each utility run 'util --help'.

findup -- find DUPlicate files
findnl -- find Name Lint (problems with filenames)
findu8 -- find filenames with invalid utf8 encoding
findbl -- find Bad Links (various problems with symlinks)
findsn -- find Same Name (problems with clashing names)
finded -- find Empty Directories
findid -- find files with dead user IDs
findns -- find Non Stripped executables
findrs -- find Redundant Whitespace in files
findtf -- find Temporary Files
findul -- find possibly Unused Libraries
zipdir -- Reclaim wasted space in ext2 directory entries
$ /usr/share/fslint/fslint/findsn --help
find (files) with duplicate or conflicting names.
Usage: findsn [-A -c -C] [[-r] [-f] paths(s) ...]

If no arguments are supplied the $PATH is searched for any redundant
or conflicting files.

-A reports all aliases (soft and hard links) to files.
If no path(s) specified then the $PATH is searched.

If only path(s) specified then they are checked for duplicate named
files. You can qualify this with -C to ignore case in this search.
Qualifying with -c is more restictive as only files (or directories)
in the same directory whose names differ only in case are reported.
I.E. -c will flag files & directories that will conflict if transfered
to a case insensitive file system. Note if -c or -C specified and
no path(s) specifed the current directory is assumed.

Example usage:

$ /usr/share/fslint/fslint/findsn /usr/share/icons/ > icons-with-duplicate-names.txt
$ head icons-with-duplicate-names.txt 
-rw-r--r-- 1 root root    683 2011-04-15 10:31 Humanity-Dark/AUTHORS
-rw-r--r-- 1 root root    683 2011-04-15 10:31 Humanity/AUTHORS
-rw-r--r-- 1 root root  17992 2011-04-15 10:31 Humanity-Dark/COPYING
-rw-r--r-- 1 root root  17992 2011-04-15 10:31 Humanity/COPYING
-rw-r--r-- 1 root root   4776 2011-03-29 08:57 Faenza/apps/16/DC++.xpm
-rw-r--r-- 1 root root   3816 2011-03-29 08:57 Faenza/apps/22/DC++.xpm
-rw-r--r-- 1 root root   4008 2011-03-29 08:57 Faenza/apps/24/DC++.xpm
-rw-r--r-- 1 root root   4456 2011-03-29 08:57 Faenza/apps/32/DC++.xpm
-rw-r--r-- 1 root root   7336 2011-03-29 08:57 Faenza/apps/48/DC++.xpm
-rw-r--r-- 1 root root    918 2011-03-29 09:03 Faenza/apps/16/Thunar.png

Ubuntu – List all files then execute script with part of the filename

This should do the job:

#!/bin/bash
for i in *.cfg; do
  sshpass -f /path/to/passwordfile scp ${i%.cfg}.txt username@${i%.cfg}:
done

or rather

for i in *.cfg;do sshpass -f /path/to/passwordfile scp ${i%.cfg}.txt username@${i%.cfg}:;done

The password is passed to scp using sshpass (following How to pass password to scp?) which reads it from passwordfile – remember to secure this file against unauthorized read access with chmod 400!

I got a little bit confused what file you want to send here, I understood there's e.g. a file ABC123.txt that has to be sent to host ABC123, so I just stripped .cfg using Bash Parameter Expansion and added .txt.

As user@host: defaults to user's home directory I removed the path.

Best Answer

Related Solutions

Ubuntu – Search for duplicate file names within folder hierarchy

Ubuntu – List all files then execute script with part of the filename

Related Question