How to Insert New Lines with Missing Values (NA) in Shell

grepshelltext processing

I would like to insert new lines in text files if there are missing values.
I have for example the following text file (A.txt), for which line 5 is missing. In addition, as the file should have 12 lines the lines 11-12 are also missing.

My expected output is the following. For missing cases a line should be added with the number and NA. As you see, this happened as desired at line 5, 11 and 12:

I am able to do this by using the following script:

f1=/my-directory/
echo "new file" > "$f1"/newfile.txt  

for i in {1..12}; do
l=$(awk '{print $1}' /"$f1"/A.txt | grep -wE ^$i /"$f1"/A.txt)
if grep --quiet -wE ^$i /"$f1"/A.txt; then echo "$l" >> "$f1"/newfile.txt; else echo "$i NA" >> "$f1"/newfile.txt; fi

done

This works fine. The problem is however that I need to do this for about 600 files containing more than about 160000 lines. The loop solution would therefore take too much time searching through all lines. My question is: is there a simpler solution that could do this?

Best Answer

You can do this with an awk script:

awk '{ while (NR + shift < $1) { print (NR + shift) " NA"; shift++ }; print } END { shift++; while (NR + shift < 13) { print (NR + shift) " NA"; shift++ } }' /tmp/test1

will produce the required output for /tmp/test1 (replace that with each file you wish to process).

In a more readable form:

#!/usr/bin/awk -f
{
    while (NR + shift < $1) {
        print (NR + shift) " NA"
        shift++
    }
    print
}
END {
    shift++
    while (NR + shift < 13) {
        print (NR + shift) " NA"
        shift++
    }
}

Save this as a file, say fill-missing, make it executable, then you can simply run

./fill-missing /tmp/test1

The script processes each line, keeping track of the expected delta with the current line number in shift. So for every line, if the current line adjusted doesn't match the first number in the line, it prints the appropriate line number followed by NA and increments the delta; once the line numbers match, it prints the current line. At the end of the process, it prints any missing lines required to reach 12.

Related Solutions

Search a pattern and print preceding lines starting with another pattern

Here's a solution in Perl:

perl -nlE '
    if    (/a/)   { @buffer = ($_) }
    elsif (/xyz/) { push @buffer,$_; say for @buffer }
    else          { push @buffer,$_}
' your_file

How this works

It reads through the file line-by-line and does one of three things:

If the current line matches the pattern a, it assigns the current line to the @buffer array.
If the current line matches the pattern xyz, it pushes the current line onto the buffer and prints the contents of the buffer
If none of the two cases above is true, it simply appends the current line to the @buffer array.

Thus, whenever a new line matches the pattern a, the contents of the @buffer are erased and replaced by the current line only. This guarantees you will find the closest a preceding xyz.

You should of course replace the regexes I used with the actual regexes relevant to your case.

Shell – Remove lines from tab-delimited file with missing values

If your fields can never contain whitespace, an empty field means either a tab as a first character (^\t), a tab as the last character (\t$) or two consecutive tabs (\t\t). You could therefore filter out lines containing any of those:

grep -Ev $'^\t|\t\t|\t$' file

If you can have whitespace, things get more complex. If your fields can begin with spaces, use this instead (it considers a field with only spaces to be empty):

grep -Pv '\t\s*(\t|$)|\t$|^\t' file

The change filters out lines matching a tab followed by 0 or more spaces and then either another tab or the end of the line.

That will also fail if the last field contains nothing but spaces. To avoid that too, use perl with the -F and -a options to split input into the @F array, telling it to print unless one of the fields is empty (/^$/):

perl -F'\t' -lane 'print unless grep{/^$/} @F' file

Best Answer

Related Solutions

Search a pattern and print preceding lines starting with another pattern

Shell – Remove lines from tab-delimited file with missing values

Related Question