Bash – How to Read from Two Input Files Using While Loop

bashio-redirectionshell-script

I wanted to know if there is any way of reading from two input files in a nested while loop one line at a time. For example, lets say I have two files FileA and FileB.

FileA:

[jaypal:~/Temp] cat filea
this is File A line1
this is File A line2
this is File A line3

FileB:

[jaypal:~/Temp] cat fileb
this is File B line1
this is File B line2
this is File B line3

Current Sample Script:

[jaypal:~/Temp] cat read.sh 
#!/bin/bash
while read lineA
    do echo $lineA 
    while read lineB
        do echo $lineB 
        done < fileb
done < filea

Execution:

[jaypal:~/Temp] ./read.sh 
this is File A line1
this is File B line1
this is File B line2
this is File B line3
this is File A line2
this is File B line1
this is File B line2
this is File B line3
this is File A line3
this is File B line1
this is File B line2
this is File B line3

Problem and desired output:

This loops over FileB completely for each line in FileA. I tried using continue, break, exit but none of them are meant for achieving the output I am looking for. I would like the script to read just one line from File A and then one line from FileB and exit the loop and continue with second line of File A and second line of File B. Something similar to the following script –

[jaypal:~/Temp] cat read1.sh 
#!/bin/bash
count=1
while read lineA
    do echo $lineA 
        lineB=`sed -n "$count"p fileb`
        echo $lineB
        count=`expr $count + 1`
done < filea

[jaypal:~/Temp] ./read1.sh 
this is File A line1
this is File B line1
this is File A line2
this is File B line2
this is File A line3
this is File B line3

Is this possible to achieve with while loop?

Best Answer

If you can guarantee that some character will never occur in the first file then you can use paste.

For example you know for sure that @ will never occur:

paste -d@ file1 file2 | while IFS="@" read -r f1 f2
do
  printf 'f1: %s\n' "$f1"
  printf 'f2: %s\n' "$f2"
done

Note that it is enough if the character is guaranteed to not occur in the first file. This is because read will ignore IFS when filling the last variable. So even if @ occurs in the second file it will not be split.

Example using some bash features for arguably cleaner code and paste using default delimiter tab:

while IFS=$'\t' read -r f1 f2
do
  printf 'f1: %s\n' "$f1"
  printf 'f2: %s\n' "$f2"
done < <(paste file1 file2)

Bash features used: ansi c string ($'\t') and process substitution (<(...)) to avoid the while loop in a subshell problem.

If you cannot be certain that any character will never occur in both files then you can use two file descriptors.

while true
do
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  printf 'f1: %s\n' "$f1"
  printf 'f2: %s\n' "$f2"
done 3<file1 4<file2

Not tested much. Might break on empty lines.

File descriptors number 0, 1, and 2 are already used for stdin, stdout, and stderr, respectively. File descriptors from 3 and up are (usually) free. The bash manual warns from using file descriptors greater than 9, because they are "used internally".

Note that open file descriptors are inherited to shell functions and external programs. Functions and programs inheriting an open file descriptor can read from (and write to) the file descriptor. You should take care to close all file descriptors which are not required before calling a function or external program.

Here is the same program as above with the actual work (the printing) separated from the meta-work (reading line by line from two files in parallel).

work() {
  printf 'f1: %s\n' "$1"
  printf 'f2: %s\n' "$2"
}

while true
do
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  work "$f1" "$f2"
done 3<file1 4<file2

Now we pretend that we have no control over the work code and that code, for whatever reason, tries to read from file descriptor 3.

unknowncode() {
  printf 'f1: %s\n' "$1"
  printf 'f2: %s\n' "$2"
  read -r yoink <&3 && printf 'yoink: %s\n' "$yoink"
}

while true
do
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  unknowncode "$f1" "$f2"
done 3<file1 4<file2

Here is an example output. Note that the second line from the first file is "stolen" from the loop.

f1: file1 line1
f2: file2 line1
yoink: file1 line2
f1: file1 line3
f2: file2 line2

Here is how you should close the file descriptors before calling external code (or any code for that matter).

while true
do
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  # this will close fd3 and fd4 before executing anycode
  anycode "$f1" "$f2" 3<&- 4<&-
  # note that fd3 and fd4 are still open in the loop
done 3<file1 4<file2

Related Solutions

Reading Lines from a File in Bash – for vs. while Loop Comparison

The for loop is fine here. But note that this is because the file contains machine names, which do not contain any whitespace characters or globbing characters. for x in $(cat file); do … does not work to iterate over the lines of file in general, because the shell first splits the output from the command cat file anywhere there is whitespace, and then treats each word as a glob pattern so \[?* are further expanded. You can make for x in $(cat file) safe if you work on it:

set -f
IFS='
'
for x in $(cat file); do …

Related reading: Looping through files with spaces in the names?; How can I read line by line from a variable in bash?; Why is while IFS= read used so often, instead of IFS=; while read..? Note that when using while read, the safe syntax to read lines is while IFS= read -r line; do ….

Now let's turn to what goes wrong with your while read attempt. The redirection from the server list file applies to the whole loop. So when ssh runs, its standard input comes from that file. The ssh client can't know when the remote application might want to read from its standard input. So as soon as the ssh client notices some input, it sends that input to the remote side. The ssh server there is then ready to feed that input to the remote command, should it want it. In your case, the remote command never reads any input, so the data ends up discarded, but the client side doesn't know anything about that. Your attempt with echo worked because echo never reads any input, it leaves its standard input alone.

There are a few ways you can avoid this. You can tell ssh not to read from standard input, with the -n option.

while read server; do
  ssh -n $server "uname -a"
done < /home/kenny/list_of_servers.txt

The -n option in fact tells ssh to redirect its input from /dev/null. You can do that at the shell level, and it'll work for any command.

while read server; do
  ssh $server "uname -a" </dev/null
done < /home/kenny/list_of_servers.txt

A tempting method to avoid ssh's input coming from the file is to put the redirection on the read command: while read server </home/kenny/list_of_servers.txt; do …. This will not work, because it causes the file to be opened again each time the read command is executed (so it would read the first line of the file over and over). The redirection needs to be on the whole while loop so that the file is opened once for the duration of the loop.

The general solution is to provide the input to the loop on a file descriptor other than standard input. The shell has constructs to ferry input and output from one descriptor number to another. Here, we open the file on file descriptor 3, and redirect the read command's standard input from file descriptor 3. The ssh client ignores open non-standard descriptors, so all is well.

while read server <&3; do
  ssh $server "uname -a"
done 3</home/kenny/list_of_servers.txt

In bash, the read command has a specific option to read from a different file descriptor, so you can write read -u3 server.

SSH – Using While Loop to SSH to Multiple Servers

ssh is reading the rest of your standard input.

while read HOST ; do … ; done < servers.txt

read reads from stdin. The < redirects stdin from a file.

Unfortunately, the command you're trying to run also reads stdin, so it winds up eating the rest of your file. You can see it clearly with:

$ while read HOST ; do echo start $HOST end; cat; done < servers.txt 
start server1.mydomain.com end
server2.mydomain.com
server3.mydomain.com

Notice how cat ate (and echoed) the remaining two lines. (Had read done it as expected, each line would have the "start" and "end" around the host.)

Why does for work?

Your for line doesn't redirect to stdin. (In fact, it reads the entire contents of the servers.txt file into memory before the first iteration). So ssh continues to read its stdin from the terminal (or possibly nothing, depending on how your script is called).

Solution

At least in bash, you can have read use a different file descriptor.

while read -u10 HOST ; do ssh $HOST "uname -a" ; done 10< servers.txt
#          ^^^^                                       ^^

ought to work. 10 is just an arbitrary file number I picked. 0, 1, and 2 have defined meanings, and typically opening files will start from the first available number (so 3 is next to be used). 10 is thus high enough to stay out of the way, but low enough to be under the limit in some shells. Plus its a nice round number...

Alternative Solution 1: -n

As McNisse points out in his/her answer, the OpenSSH client has an -n option that'll prevent it from reading stdin. This works well in the particular case of ssh, but of course other commands may lack this—the other solutions work regardless of which command is eating your stdin.

Alternative Solution 2: second redirect

You can apparently (as in, I tried it, it works in my version of Bash at least...) do a second redirect, which looks something like this:

while read HOST ; do ssh $HOST "uname -a" < /dev/null; done < servers.txt

You can use this with any command, but it'll be difficult if you actually want terminal input going to the command.