Bash – Replacing the values in one file with the values in another file in bash

bashcsvshell-scripttext processing

I am having a csv file naming List.csv in following format:

Location,IP Address,Host Name,Domain,Domain Name, User Name,Manufacturer,Model,System Type, Serial Number, Operating System,RAM (GB),Processor Type,Processor Frequency
H1,xx.xx.xx.xx,PC1,domain.com,DOMAIN,User1,LENOVO,4089AZ8,X86-based PC,L90RA96,Microsoft Windows 7 Professional ,2,Pentium(R) Dual-Core CPU E5800,3.20GHz
H3,xx.xx.xx.xx,PC2,domain.com,DOMAIN,User2,LENOVO,4089AZ8,X86-based PC,L906W3P,Microsoft Windows 7 Professional ,2,Pentium(R) Dual-Core CPU E5800,3.20GHz
H2,xx.xx.xx.xx,PC3,domain.com,DOMAIN,User3,LENOVO,4089A76,X86-based PC,L929410,Microsoft Windows 7 Professional ,2,Pentium(R) Dual-Core CPU E5400,2.70GHz
H2,xx.xx.xx.xx,PC4,domain.com,DOMAIN,User4,Hewlett-Packard,Z800,x64-based PC,SGH007QT16,Microsoft Windows 7 Professional ,12,Intel(R) Xeon(R) CPU W5590,3.33GHz

If you look at the MODEL column, it is bearing some values which do not interpret the name of the model. I have created another file, model-list.csv, which contains these values and their corresponding model names. It looks something like :

Manufacturer,Value,Model Name
Lenovo, 4089AZ8, ThinkCentre
Lenovo, 4089A76, ThinkCentre
HP, Z800, HP Z800 Workstation

I want the values in the List.csv file to be replaced by the corresponding model name present in model-list.csv. As there are 2900+ items in the List.csv and about 150 items in model-list.csv file, I was planning to achieve this using a bash script, which is as follows:

#!/bin/bash

file1="List.csv"
file2="model-list.csv"
outfile="List_out.csv"
stagingfile="List-staging.csv"

rm -f "$outfile" "$stagingfile"

while read line
do
        ModelNo=`echo "$line"|awk -F',' '{print $2}'`
        ModelName=`echo "$line"|awk -F',' '{print $3}'`


        cat "$file1"|grep ",$ModelNo," > "$stagingfile"
        if [ -s "$stagingfile" ]
        then

                while read line1
                do
                        NewLine=`echo "$line1"|sed "s/,${ModelNo},/,${ModelName},/g"`
                        echo "$NewLine" >> "$outfile"

                done < "$stagingfile"
                rm -f "$stagingfile"
        fi

done < "$file2"

When above script is executed, the "$outfile" contains almost 40-50 additional entries as compared to the List.csv.

Anything wrong with the script?

Best Answer

You can use awk for this:

awk -F',|, ' 'NR==FNR{a[$2]=$3} NR>FNR{$8=a[$8];print}' OFS=',' "$file2" "$file1"

This reads model-list.csv, storing all the models and their descriptions into an string-indexed array (eg a["Z800"] == "HP Z800 Workstation"). Then it reads through the list data, replacing each model with the description string from the array.

Explanation:

  • -F',|, ' - this sets the field separator using a regex pattern, in this case the field separator will be either a single comma, or a single comma and a single space.
  • NR==FNR{a[$2]=$3} - NR is an awk internal variable which keeps track of the total number of rows read since the program began. FNR is similar, but keeps track of the number of rows of the current file which have been read. So NR==FNR is an awk idiom which means "if this is the first file to be read", and the associated action is a[$2]=$3 which saves the value of field 3 in the array a, with the string index being set to the value of field 2.
  • NR>FNR{$8=a[$8];print}' - similar to the previous, but this time operates only on files other than the first to be read. For each line, we use the value of field 8 as the index to look up the value in the array, then re-assign field 8 to the array value. Finally, the whole line is printed .
  • OFS=',' "$file2" "$file1" - sets the output field separator to a comma (default is space), then reads in 2 files in the specified order.
Related Question