I have the following input, in which some of the values are comma delimited. I would like decouple each value and print it in the new line. I tried many ways in awk
, but no luck. If I have three comma delimited values, the first value should be retained in the same row, but the rest of the values should be wrapped up in the new lines.
Input File
===========
key1|0|11881|0|0|0|0|11769|0|0|0
key2|2027|345,712|0|1|0|2040|364,729|0|1|0
key3|0|670944|0|0|0|0|495554|0|0|0
key4|1847|1,21|0|0|0|1814|1,22|0|0|0
key5|1880|11,402|0|154|0|1886|11,397|0|151|0
key6|1,1,19137|65,4570|0|8|0|16684,0|51,4176|0|8|0
key7|1851|11,757|0|202|0|1856|13,751|0|193|0
Expected Output
================
key1|0|11881|0|0|0|0|11769|0|0|0
key2|2027|345|0|1|0|2040|364|0|1|0
key2|-|712|-|-|-|-|729|-|-|-
key3|0|670944|0|0|0|0|495554|0|0|0
key4|1847|1|0|0|0|1814|1|0|0|0
key4|-|21|-|-|-|-|22|-|-|-
key5|1880|11|0|154|0|1886|11|0|151|0
key5|-|402|-|-|-|-|397|-|-|-
key6|1|65|0|8|0|16684|51|0|8|0
key6|1|4570|-|-|-|0|4176|-|-|-
key6|19137|-|-|-|-|-|-|-|-|-
key7|1851|11|0|202|0|1856|13|0|193|0
key7|-|757|-|-|-|-|751|-|-|-
EDIT #1
Per a comment left by @Avinash, here are my attempts. I'm new to the awk
world, so may be beating around the bush. I followed the below steps. Please suggest me if any easier solution if you have in mind too. I am not getting the solution as expected.
step 1: $ awk -f test.awk a.txt > index.txt --> creates the index files
step 2: $ awk -f test2.awk a.txt > main.txt --> extracts the lines which are comma delimited and duplicate them number of times equal to the comma delimited values
step 3: $ awk -f updt_db1.awk index.txt main.txt --> updates the respective column
index.txt
=========
0|key2|3|345
1|key2|3|712
2|key2|8|364
3|key2|8|729
4|key4|3|1
5|key4|3|21
6|key4|8|1
7|key4|8|22
8|key5|3|11
9|key5|3|402
10|key5|8|11
11|key5|8|397
12|key6|2|1
13|key6|2|1
14|key6|2|19137
15|key6|3|65
16|key6|3|4570
17|key6|7|16684
18|key6|7|0
19|key6|8|51
20|key6|8|4176
21|key7|3|11
22|key7|3|757
23|key7|8|13
24|key7|8|751
main.txt
========
0|key2|2027|345,712|0|1|0|2040|364,729|0|1|0
1|key2|2027|345,712|0|1|0|2040|364,729|0|1|0
2|key2|2027|345,712|0|1|0|2040|364,729|0|1|0
3|key2|2027|345,712|0|1|0|2040|364,729|0|1|0
4|key4|1847|1,21|0|0|0|1814|1,22|0|0|0
5|key4|1847|1,21|0|0|0|1814|1,22|0|0|0
6|key4|1847|1,21|0|0|0|1814|1,22|0|0|0
7|key4|1847|1,21|0|0|0|1814|1,22|0|0|0
8|key5|1880|11,402|0|154|0|1886|11,397|0|151|0
9|key5|1880|11,402|0|154|0|1886|11,397|0|151|0
10|key5|1880|11,402|0|154|0|1886|11,397|0|151|0
11|key5|1880|11,402|0|154|0|1886|11,397|0|151|0
12|key6|1,1,19137|65,4570|0|8|0|16684,0|51,4176|0|8|0
13|key6|1,1,19137|65,4570|0|8|0|16684,0|51,4176|0|8|0
14|key6|1,1,19137|65,4570|0|8|0|16684,0|51,4176|0|8|0
15|key6|1,1,19137|65,4570|0|8|0|16684,0|51,4176|0|8|0
16|key6|1,1,19137|65,4570|0|8|0|16684,0|51,4176|0|8|0
17|key6|1,1,19137|65,4570|0|8|0|16684,0|51,4176|0|8|0
18|key6|1,1,19137|65,4570|0|8|0|16684,0|51,4176|0|8|0
19|key6|1,1,19137|65,4570|0|8|0|16684,0|51,4176|0|8|0
20|key6|1,1,19137|65,4570|0|8|0|16684,0|51,4176|0|8|0
21|key7|1851|11,757|0|202|0|1856|13,751|0|193|0
22|key7|1851|11,757|0|202|0|1856|13,751|0|193|0
23|key7|1851|11,757|0|202|0|1856|13,751|0|193|0
24|key7|1851|11,757|0|202|0|1856|13,751|0|193|0
Output
0|key2|2027|345|0|1|0|2040|364,729|0|1|0|
1|key2|2027|712|0|1|0|2040|364,729|0|1|0|
2|key2|2027|345,712|0|1|0|2040|364|0|1|0|
3|key2|2027|345,712|0|1|0|2040|729|0|1|0|
4|key4|1847|1|0|0|0|1814|1,22|0|0|0|
5|key4|1847|21|0|0|0|1814|1,22|0|0|0|
6|key4|1847|1,21|0|0|0|1814|1|0|0|0|
7|key4|1847|1,21|0|0|0|1814|22|0|0|0|
8|key5|1880|11|0|154|0|1886|11,397|0|151|0|
9|key5|1880|402|0|154|0|1886|11,397|0|151|0|
10|key5|1880|11,402|0|154|0|1886|11|0|151|0|
11|key5|1880|11,402|0|154|0|1886|397|0|151|0|
12|key6|1|65,4570|0|8|0|16684,0|51,4176|0|8|0|
13|key6|1|65,4570|0|8|0|16684,0|51,4176|0|8|0|
14|key6|19137|65,4570|0|8|0|16684,0|51,4176|0|8|0|
15|key6|1,1,19137|65|0|8|0|16684,0|51,4176|0|8|0|
16|key6|1,1,19137|4570|0|8|0|16684,0|51,4176|0|8|0|
17|key6|1,1,19137|65,4570|0|8|0|16684|51,4176|0|8|0|
18|key6|1,1,19137|65,4570|0|8|0|0|51,4176|0|8|0|
19|key6|1,1,19137|65,4570|0|8|0|16684,0|51|0|8|0|
20|key6|1,1,19137|65,4570|0|8|0|16684,0|4176|0|8|0|
21|key7|1851|11|0|202|0|1856|13,751|0|193|0|
22|key7|1851|757|0|202|0|1856|13,751|0|193|0|
23|key7|1851|11,757|0|202|0|1856|13|0|193|0|
24|key7|1851|11,757|0|202|0|1856|751|0|193|0|
$ cat test.awk
#!/bin/awk
BEGIN{
FS="|";
counter=0
}
NR == FNR {
for(i=1; i<= NF; i++)
{
n=split($i,arr,",")
if ( n > 1)
for(j=1; j<=n; j++)
printf "%s|%s|%s|%s\n",counter++,$1,i,arr[j]
}
}
$ cat test2.awk
#!/bin/awk
BEGIN { FS="|"
k=0;
}
NR == FNR {
for(i=1; i<= NF; i++)
{
n=split($i,arr,",")
if (n > 1)
for(j=1; j<=n; j++)
printf("%s|%s\n",k++,$0)
}
}
$ cat updt_db1.awk
#!/bin/awk
BEGIN {
FS = "|"
}
( NR == FNR ) {
lookup[$1] = $0
}
( NR > FNR ) {
key = toupper($1)
if (key in lookup){
split(lookup[key], replacements, "|")
for (i = 1; i <= NF; i++)
col[i] = $i;
for (i=3; replacements[i+1] != "" ; i=i+1){
j=replacements[i]
col[j+1] = replacements[i+1]
}
for (i = 1; i <= NF; i++)
printf "%s|", col[i]
print ""
}
else
print $0;
}
Best Answer
You could use
sed
like this:It wound up being relatively simple after all. Applying that little script to your data gets:
Basically
sed
just tackles each field from both ends. It first saves a copy of its current iteration toh
old space. Thensed
removes everything from every field following the first comma on. After thatsed
switches back to its saved copy so it can remove the field it just saved in the other buffer.If commas remain it appends the second copy to the first following an inserted
\n
ewline character so it can recurse at least once more whenP;D
P
rint thenD
elete only up to the first occurring\n
ewline character in pattern space before starting over with what remains.