You could run into trouble storing large files in memory, this is slightly better as it only stores matching lines, after sort has done the heavy lifting of putting the lines in order.
# Input must be sorted first, then we only need to keep matching lines in memory
# Once we reach a non-matching line we print the lines in memory, prefixed by count
# with awk, variables are unset to begin with so, we can get away without explicitly initializing
{ # S2, S3, S4 are saved field values
if($2 == S2 && $3 == S3 && $4 == S4) {
# if fields 2,3,4 are same as last, save line in array, increment count
line[count++] = $0;
} else {
# new line with fields 2, 3, 4 different
# print stored lines, prefixed by the count
for(i in line) {
print count, line[i];
}
# reset counter and array
count=0;
delete line;
# save this line in array, increment count
line[count++] = $0;
}
# store field values to compare with next line read
S2 = $2; S3 = $3; S4 = $4;
}
END{ # on EOF we still have saved lines in array, print last lines
for(i in line) {
print count, line[i];
}
}
It is customary to save awk
scripts in a file.
You could use this along the lines of
sort -k2,4 file | awk -f script
3 ID-fred 4.0 6.0 42.0
3 ID-jacob 4.0 6.0 42.0
3 ID-tessa 4.0 6.0 42.0
2 ID-elsa 5.0 8.0 45.0
2 ID-trudy 5.0 8.0 45.0
1 ID-gerard 6.0 8.0 20.0
Try this
$ awk -F, 'NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$
Same code is more readable if split across a few lines :
$ awk -F, '
> NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}
> NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}
> NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$
If 1st line, split the line into array elements within a. Fix the Citty->City typo.
If 2nd line, starting with the 2nd column, print the corresponding column from 1st line together with this column. Repeat for each column, going in 2 column increments. Strip the trailing ,
.
After 2nd line, replace any leading ,
or any "<spaces>",
with an empty string and then print the result.
Tested ok on GNU Awk 4.0.2
Try it online!
Best Answer
The reason the
myfile.csv
is not changing is because the-u
option foruniq
will only print unique lines. In this file, all lines are duplicates so they will not be printed out.However, more importantly, the output will not be saved in
myfile.csv
becauseuniq
will just print it out tostdout
(by default, your console).You would need to do something like this:
$ sort -u myfile.csv -o myfile.csv
The options mean:
-u
- keep only unique lines-o
- output to this file instead ofstdout
You should view
man sort
for more information.