How does this AWK script work

command lineunix

I have 2 data files each containing one column.
I want to make another data file by merging both the columns.
I have the command line in shell but I don't know how it works.

Please explain elaborately the below command:

awk 'NR==FNR {a[i++]=$0};
             {b[x++]=$0;};{k=x-i};
     END     {for(j=0;j<i;) print a[j++],b[k++]}' \
  file1.txt file2.txt

Example:

input:

file1.txt   
11
23
19
31
67
file2.txt
13
19
25
67
93

I used the command above to write a shell script and got following output:

11 13
23 19
19 25
31 67
67 93

I want to know how this command line is working on this example to give the output?

Best Answer

Well, part of learning to use Unix is to figure out what existing scripts are doing. In this case you need to know a bit about how awk works to understand the code. I will focus on describing the awk part, this should get you started in figuring out the rest.

Basically awk is a pattern-driven scripting language, where commands consist of both a (search) pattern/condition and a corresponding code block. During execution, any input files are read line by line and if the pattern/condition is true for a line, the code block is executed. There are special patterns BEGIN and END which are used to trigger code to get executed before the first line or after the last line is read.

In your example you have three pattern/code lines:

NR==FNR {a[i++]=$0};

NR and FNR are two special variables set by awk. You can look up their meaning with man awk to see that

NR     ordinal number of the current record
FNR    ordinal number of the current record in the current file

so basically this condition is true while lines from the first line are read (which means that a[i++]=$0 is executed once for each line from the first file) and false for all additional files. $0 is the current line of input.

        {b[x++]=$0;};{k=x-i};

This code block has no condition/pattern so it gets executed for every line read (from all files including the first one).

END     {for(j=0;j<i;) print a[j++],b[k++]}' 

This part runs after the last line of the last file has been read and processed.

With these basics you should be able to figure out the meaning of the different code blocks and variables yourself.