I have a csv file with 17 columns and million rows. I want to search for a specific string in the 16th column and replace all the instances of that string with another string. Since my rest of the program uses bash script, I thought using awk instead of Python search & replace. My current OS is Rhel6.
The following is the sample output of my data:
SUBSCRIBER_ID|ACCOUNT_CATEGORY|ACCOUNT_ACTIVATION_DATE|PACKAGE_NAME|PACKAGE_TYPE|DURATION|ACTIVE_DATE|INACTIVE_DATE|STB_NO|PRIMARY_SECONDARY|MODEL_TYPE|VC_NO|MULTIROOM|STB_TYPE|IPKG|SERVICE_STATE|CURRENT_STATUS
1001098068|ResidentialRegular|01/20/2007|Annual package 199 May17 pack|Basic Package|Annual|08/28/2017||027445053518|Primary|Pace - 31|000223871682|Yes|AMP|Package 199 pack|Market1|Active
1001098068|ResidentialRegular|01/20/2007|Annual Pack|Premium Package|Annual|08/28/2017||027445053518|Primary|Pace - 31|000223871682|Yes|AMP|English Movies pack|Market1|Active
1001098068|ResidentialRegular|01/20/2007|Annual SingleUnit Jun17 Pack|Secondary Pack|Annual|08/28/2017||032089364015|Secondary|Kaon|000017213968|Yes|AMP|SingleUnit|Market2|Active
In this the 16th column is Market, wherein I want to change the Market1
to MarketPrime
. The name of the file is marketinfo_2018-06-26.csv
I tried the following code:
awk -F '| +' '{gsub("Market1","MarketPrime",$16); print}' OFS="|" marketinfo_2018-06-26.csv > marketinfo_2018-06-26.csv
This runs without any output, but the string Market1
still remains.
Best Answer
The only real issue in your code is that you set the input file separator to not just
|
but to spaces as well. This will make the spaces count as field separators in the data and it will be incredibly hard to figure out what the correct field number is (since some fields contain a variable number of spaces).You also can not redirect into the same filename as you use to read from. Doing so would cause the shell to first truncate (empty) the output file, and your
awk
program would have no data to read.Your code does a regular expression replacement. This is ok, but you need to be aware that if the 16th field happens to be something like
Market12
orTheMarket1
, it would trigger the substitution due to the missing anchor points. It would be safer to use^Market1$
as the expression to replace, or to use a string comparison.The
awk
command above uses only|
as a field separator and then does a string comparison with the 16th field. If that field isMarket1
, it is set toMarketPrime
.The trailing
1
at the end of theawk
code causes every record (modified or not) to be printed.