Awk search and replace string in a specific column of CSV file

awkcsvtext processing

I have a csv file with 17 columns and million rows. I want to search for a specific string in the 16th column and replace all the instances of that string with another string. Since my rest of the program uses bash script, I thought using awk instead of Python search & replace. My current OS is Rhel6.

The following is the sample output of my data:

SUBSCRIBER_ID|ACCOUNT_CATEGORY|ACCOUNT_ACTIVATION_DATE|PACKAGE_NAME|PACKAGE_TYPE|DURATION|ACTIVE_DATE|INACTIVE_DATE|STB_NO|PRIMARY_SECONDARY|MODEL_TYPE|VC_NO|MULTIROOM|STB_TYPE|IPKG|SERVICE_STATE|CURRENT_STATUS
1001098068|ResidentialRegular|01/20/2007|Annual package 199 May17 pack|Basic Package|Annual|08/28/2017||027445053518|Primary|Pace - 31|000223871682|Yes|AMP|Package 199 pack|Market1|Active
1001098068|ResidentialRegular|01/20/2007|Annual Pack|Premium Package|Annual|08/28/2017||027445053518|Primary|Pace - 31|000223871682|Yes|AMP|English Movies pack|Market1|Active
1001098068|ResidentialRegular|01/20/2007|Annual SingleUnit Jun17 Pack|Secondary Pack|Annual|08/28/2017||032089364015|Secondary|Kaon|000017213968|Yes|AMP|SingleUnit|Market2|Active

In this the 16th column is Market, wherein I want to change the Market1 to MarketPrime. The name of the file is marketinfo_2018-06-26.csv

I tried the following code:

awk -F '| +' '{gsub("Market1","MarketPrime",$16); print}' OFS="|" marketinfo_2018-06-26.csv > marketinfo_2018-06-26.csv

This runs without any output, but the string Market1 still remains.

Best Answer

awk -F '|' -v OFS='|' '$16 == "Market1" { $16 = "MarketPrime" }1' file.csv >new-file.csv

The only real issue in your code is that you set the input file separator to not just | but to spaces as well. This will make the spaces count as field separators in the data and it will be incredibly hard to figure out what the correct field number is (since some fields contain a variable number of spaces).

You also can not redirect into the same filename as you use to read from. Doing so would cause the shell to first truncate (empty) the output file, and your awk program would have no data to read.

Your code does a regular expression replacement. This is ok, but you need to be aware that if the 16th field happens to be something like Market12 or TheMarket1, it would trigger the substitution due to the missing anchor points. It would be safer to use ^Market1$ as the expression to replace, or to use a string comparison.

The awk command above uses only | as a field separator and then does a string comparison with the 16th field. If that field is Market1, it is set to MarketPrime.

The trailing 1 at the end of the awk code causes every record (modified or not) to be printed.

Related Question