Linux – Aligning Values in CSV File Using Bash and Awk

awkbashcsvlinux

I have a csv file with the following information this csv file was created by combining results from aws rds describe-db-instances and
aws rds describe-reserved-db-instances and by parsing the results with jq then I pasted together the csv files.

DB Instance Identifier,Reservation ID,State
alpha-db-dev,alpha-db-dev-26aug2022,active
alpha-db-prod,alpha-db-prod-26aug2022,active
alpha-db-staging,alpha-db-staging-26aug2022,active
beta-db-dev-primary,beta-db-dev-8dec2021,active
beta-db-prod-primary,charlie-db-dev-8dec2021,active
beta-db-prod-replica,charlie-db-staging-loadtest-8dec2021,active
beta-db-staging-primary,,
charlie-db-dev,,
charlie-db-prod-dms,,
charlie-db-prod-dms-replica,,
charlie-db-staging,,
charlie-db-staging-loadtest,,

What I'm trying to do is match the first few words of the 2nd column and the first few words of the 1st column.

The result should be like this

DB Instance Identifier,Reservation ID,State
alpha-db-dev,alpha-db-dev-26aug2022,active
alpha-db-prod,alpha-db-prod-26aug2022,active
alpha-db-staging,alpha-db-staging-26aug2022,active
beta-db-dev-primary,beta-db-dev-8dec2021,active
beta-db-prod-primary,,
beta-db-prod-replica,,
beta-db-staging-primary,,
charlie-db-dev,charlie-db-dev-8dec2021,active
charlie-db-prod-dms,,
charlie-db-prod-dms-replica,,
charlie-db-staging,,
charlie-db-staging-loadtest,charlie-db-staging-loadtest-8dec2021,active

Basically what I would want to happen is that in the csv file, I would want to match the charlie-db-staging-loadtest with charlie-db-staging-loadtest-8dec2021 to be in the same row.

I have been clueless on what to do next and how to match the texts between the column DB Instance Identifier and Reservation ID. More over, how to move matching texts to be in the same row. So I have no attempts on solving this so far.

I am open to using any commands as long as it would work with bash. And I would really appreciate the breakdown of commands as I will be applying this to a somewhat similar but not quite the same csv file that has tons of data that would be way too big to put here.

I have been playing around with sed and awk but I can't seem to make it work just yet. But maybe I'm thinking of the wrong commands here. Any insights would be helpful.

Thank you so much!

Best Answer

@Thor is absolutely right but in case you're in a hurry to wrap this up and so you or others with a similar problem can learn how to do such things for yourself in future, here's how to do what you asked for using any awk and a 2-pass approach:

$ cat tst.awk
BEGIN { FS=OFS="," }
NR == FNR {
    id = $2
    sub(/-[^-]+$/,"",id)
    vals[id] = $2 OFS $3
    next
}
{ print (FNR>1 ? $1 OFS vals[$1] : $0) }

$ awk -f tst.awk file file
DB Instance Identifier,Reservation ID,State
alpha-db-dev,alpha-db-dev-26aug2022,active
alpha-db-prod,alpha-db-prod-26aug2022,active
alpha-db-staging,alpha-db-staging-26aug2022,active
beta-db-dev-primary,
beta-db-prod-primary,
beta-db-prod-replica,
beta-db-staging-primary,
charlie-db-dev,charlie-db-dev-8dec2021,active
charlie-db-prod-dms,
charlie-db-prod-dms-replica,
charlie-db-staging,
charlie-db-staging-loadtest,charlie-db-staging-loadtest-8dec2021,active

I think it's very obvious what it's doing but if after reading the man pages, googling, adding print statements to track variables values, etc. (so you can learn more about awk in general) you can't figure it out then just ask specific questions in comments and I'll be happy to answer them. To learn awk in general, get the book Effective AWK Programming, 5th edition, by Arnold Robbins.

Related Question