I would like to insert new lines in text files if there are missing values.
I have for example the following text file (A.txt), for which line 5 is missing. In addition, as the file should have 12 lines the lines 11-12 are also missing.
1 2.30
2 3.01
3 3.22
4 3.34
6 3.01
7 2.90
8 2.99
9 3.00
10 3.02
My expected output is the following. For missing cases a line should be added with the number and NA. As you see, this happened as desired at line 5, 11 and 12:
1 2.30
2 3.01
3 3.22
4 3.34
5 NA
6 3.01
7 2.90
8 2.99
9 3.00
10 3.02
11 NA
12 NA
I am able to do this by using the following script:
f1=/my-directory/
echo "new file" > "$f1"/newfile.txt
for i in {1..12}; do
l=$(awk '{print $1}' /"$f1"/A.txt | grep -wE ^$i /"$f1"/A.txt)
if grep --quiet -wE ^$i /"$f1"/A.txt; then echo "$l" >> "$f1"/newfile.txt; else echo "$i NA" >> "$f1"/newfile.txt; fi
done
This works fine. The problem is however that I need to do this for about 600 files containing more than about 160000 lines. The loop solution would therefore take too much time searching through all lines. My question is: is there a simpler solution that could do this?
Best Answer
You can do this with an
awk
script:will produce the required output for
/tmp/test1
(replace that with each file you wish to process).In a more readable form:
Save this as a file, say
fill-missing
, make it executable, then you can simply runThe script processes each line, keeping track of the expected delta with the current line number in
shift
. So for every line, if the current line adjusted doesn't match the first number in the line, it prints the appropriate line number followed byNA
and increments the delta; once the line numbers match, it prints the current line. At the end of the process, it prints any missing lines required to reach 12.