Replace data at specific positions in txt file using data from another file

awkpastesedtext processing

I have a text file in the below format:

$data This is the experimental data    
good data
This is good file
datafile
1 4324 3673 6.2e+11 7687 67576
2 3565 8768 8760 5780 8778          "This is line '2'"
3 7656 8793 -3e+11 7099 79909
4 8768 8965 8769 9879 0970
5 5878 9879 7.970e-1 9070 0709799
.
.
.
100000 3655 6868 97879 96879 69899
$.endfile

I want to replace the data of the 3rd and 4th column from row '2' to '100000' with the data from two other text files which have one column of 99999 rows each.

How can I do this using awk, sed or any other unix command?
Note that the column delimiter is space.

The other two text files have 99999 lines each, and they are both in the following format:

Best Answer

Since you haven’t asked for a 100% awk solution, I’ll offer a hybrid that (a) may, arguably, be easier to understand, and (b) doesn’t stress awk’s memory limits:

awk '
    $1 == 2 { secondpart = 1 }
       { if (!secondpart) {
                print > "top"
         } else {
                print $1, $2 > "left"
                print $5, $6, $7, $8, $9 > "right"
         }
       }' a
(cat top; paste -d" " left b c right) > new_a
rm top left right

Or we can eliminate one of the temporary files and shorten the script by one command:

(awk '
    $1 == 2 { secondpart = 1 }
       { if (!secondpart) {
                print
         } else {
                print $1, $2 > "left"
                print $5, $6, $7, $8, $9 > "right"
         }
       }' a; paste -d" " left b c right) > new_a
rm left right

This will put some extra spaces at the ends of the lines of the output, and it will lose data from file a if any line has more than nine fields (columns). If those are issues, they can be fixed fairly easily.

Related Solutions

Lum – Select certain column of each file, paste to a new file

with paste under bash you can do:

paste <(cut -f 4 1.txt) <(cut -f 4 2.txt) .... <(cut -f 4 20.txt)

With a python script and any number of files (python scriptname.py column_nr file1 file2 ... filen):

#! /usr/bin/env python

# invoke with column nr to extract as first parameter followed by
# filenames. The files should all have the same number of rows

import sys

col = int(sys.argv[1])
res = {}

for file_name in sys.argv[2:]:
    for line_nr, line in enumerate(open(file_name)):
        res.setdefault(line_nr, []).append(line.strip().split('\t')[col-1])

for line_nr in sorted(res):
    print '\t'.join(res[line_nr])

How to append text from one line, to the end of another

A simple Perl script will do the trick nicely (Perl is already installed on dang near everything):

#!/usr/bin/env perl

my @rows; # Preserve order of appearance
my %rows;

my $heading;

for (<>) {
    chomp;
    if (s/^\s+/   /) {
        $heading .= $_;
    } elsif (/^(\w+) (.*)$/) {
        push @rows, $1 if not exists $rows{$1};
        $rows{$1} .= $2;
    } else {
        die "Invalid line format at line $.";
    }
}
my $fmt = "%-5s %s\n"; # Adjust width to suit taste
printf $fmt, '', $heading;
printf $fmt, $_, $rows{$_} for @rows;

Simply invoke this program with your data something like so:

$ my_column.pl < your_data.txt

(Assuming you saved the above script as my_column.pl and made it executable with chmod 755 my_column.pl of course!)

The above should get the job done, but if you need precise column alignment or more advanced formatting in general, you can split the columns and force particular column widths with printf, or one of the many tabular formatting modules available for Perl.

Best Answer

Related Solutions

Lum – Select certain column of each file, paste to a new file

How to append text from one line, to the end of another

Related Question