I have a large file which looks like:
India 07 1800 BAHRAICH 42273 +28.4 +26.7 NA 997.1 1 NA NA
India 07 1800 BAHRAICH 42273 +28.4 +26.7 NA 997.1 NA NA NA
India 07 1800 BALASORE 42895 +29.0 +26.8 NA 999.7 NA NA NA
India 07 1800 BANGALORE 43295 +23.0 +17.4 908.1 geopotential_of_850mb_=_492 NA NA NA
India 07 1800 BANGALORE 43295 +23.0 +17.4 908.1 geopotential_of_850mb_=_492 Trace NA NA
India 07 1800 BAREILLY 42189 +28.4 +26.2 NA 997.4 NA NA NA
India 07 1800 BAREILLY 42189 +28.4 +26.2 NA 997.4 Trace NA NA
India 07 1800 BARMER 42435 +35.6 +22.6 NA 997.6 NA NA NA
India 07 1800 BHOPAL_BAIRAGHAR 42667 +23.6 +23.3 942.7 1000.5 13 NA NA
India 07 1800 BHOPAL_BAIRAGHAR 42667 +23.6 +23.3 942.7 1000.5 NA NA NA
India 07 1800 BHUBANESHWAR 42971 +28.0 +25.7 NA 1000.7 NA NA NA
India 07 1800 BHUJ-RUDRAMATA 42634 +29.6 +25.7 NA 999.5 NA NA NA
India 07 1800 BIKANER 42165 +33.8 +25.1 NA 994.0 NA NA NA
India 07 1800 BIKANER 42165 +33.8 +25.1 NA 994.0 NA NA NA
India 07 1800 BOMBAY_SANTACRUZ 43003 +29.0 +26.8 NA 1004.4 10 NA NA
India 07 1800 BOMBAY_SANTACRUZ 43003 +29.0 +26.8 NA 1004.4 NA NA NA
In this file 2-3 lines are same with only one entry are different in the form of entry "NA" which can occur at any position. I want keep the line with less number of "NA".
I am not able to think a solution for this.
I want output as:
India 07 1800 BAHRAICH 42273 +28.4 +26.7 NA 997.1 1 NA NA
India 07 1800 BALASORE 42895 +29.0 +26.8 NA 999.7 NA NA NA
India 07 1800 BANGALORE 43295 +23.0 +17.4 908.1 geopotential_of_850mb_=_492 Trace NA NA
India 07 1800 BAREILLY 42189 +28.4 +26.2 NA 997.4 Trace NA NA
India 07 1800 BARMER 42435 +35.6 +22.6 NA 997.6 NA NA NA
India 07 1800 BHOPAL_BAIRAGHAR 42667 +23.6 +23.3 942.7 1000.5 13 NA NA
India 07 1800 BHUBANESHWAR 42971 +28.0 +25.7 NA 1000.7 NA NA NA
India 07 1800 BHUJ-RUDRAMATA 42634 +29.6 +25.7 NA 999.5 NA NA NA
India 07 1800 BIKANER 42165 +33.8 +25.1 NA 994.0 NA NA NA
India 07 1800 BOMBAY_SANTACRUZ 43003 +29.0 +26.8 NA 1004.4 10 NA NA
I will appreciate even logic to do so.
Thanks
Best Answer
Assuming the key is the 4th field and records with identical keys are consecutive (and I understood your question correctly), you could do something like:
Which among consecutive lines with same 4th field, prints the first one with the least number of
NA
fields.If they're not consecutive, you could use some sorting:
With
busybox
sort
, you'd want to add the-s
option to the second invocation as it seems to do some level of sorting of the input again despite the-m
.