Whew. This is a bit of a tricky one given your scenario.
First off, router 2 shouldn't share an interface with router 1 that is also on their client-side interfaces.
In your example, all 4 devices share the same LAN segment, which as an aside is in the non-routable (edit: across the "Internet") IP address range for the 192.168.x.y family.
A better way would be to think of it as follows:
Computer 1
IP: 192.168.1.100
Subnet: 255.255.255.0
Default Gateway: 192.168.1.1
MAC: 03:00:00:00:00:11
Router 1
IP: 192.168.1.1
Subnet: 255.255.255.0
MAC: 03:00:00:00:00:22
IP: 192.168.12.1
Subnet: 255.255.255.0
MAC: 03:00:00:00:00:33
Router 2
IP: 192.168.12.2
Subnet: 255.255.255.0
MAC: 03:00:00:00:00:44
IP: 192.168.2.1
Subnet: 255.255.255.0
MAC: 03:00:00:00:00:55
Computer 2
IP: 192.168.2.200
Subnet: 255.255.255.0
Default Gateway: 192.168.2.1
MAC: 03:00:00:00:00:66
Ignore the fact that the MAC's lead with a 03 hex character, that's just for correctness for the network snobs out there. [EDIT] I had to amend it to 0x03 for it to be both locally assigned and globally unique.
So what is happening here is:
There's a cable from Computer 1 plugged into Router 1. These two share the 192.168.1.x network.
There's a cable from Router 1 to Router 2. These share the 192.168.12.x network.
There's a cable from Router 2 to Computer 2. These share the 192.168.2.x network.
In your original write-up, all 4 devices would have had to be connected to the same switch in order for it to even work... and in such a case Computer 1 would've talked directly to Computer 2. Note: for you network wizards out there I know you can do static routing to force the original network configuration to work, but that isn't what this User is asking about....
Now on to your specific question.
You are half correct. The MAC address that Computer 2 see's is that of Router 2. In my example that would be MAC 03:00:00:00:00:55. However, the IP address it see's is of Computer 1. That is how Computer 2 can respond back to Computer 1. IP addresses are, theoretically, "universally unique."
The way networks work, given your level of intent of knowledge, is that layer 2 (datalink / MAC-layer - in an all Ethernet/IPv4 environment) addresses change PER HOP. PER HOP is defined as 'transiting any layer-3 processing device'. Router's and computers almost always process layer 3. Switches can process layer 3 but they tend to leave it alone.
So as a message goes from Computer 1 to Computer 2 the flow looks like:
AT HOP 1 - Between Computer 1 and Router 1
SourceIP: 192.168.1.100 (Computer 1)
SourceMAC: 03:00:00:00:00:11 (Computer 1)
DestIP: 192.168.2.200 (Computer 2)
DestMAC: 03:00:00:00:00:22 (Router 1 - Interface facing Computer 1)
AT HOP 2 - Between the routers
SourceIP: 192.168.1.100 (Computer 1)
SourceMAC: 03:00:00:00:00:33 (Router 1 - Interface facing Router 2)
DestIP: 192.168.2.200 (Computer 2)
DestMAC: 03:00:00:00:00:44 (Router 2 - Interface facing Router 1)
AT HOP 3 - Between Router 2 and Computer 2
SourceIP: 192.168.1.100 (Computer 1)
SourceMAC: 03:00:00:00:00:55 (Router 2 - Interface facing Computer 2)
DestIP: 192.168.2.200 (Computer 2)
DestMAC: 03:00:00:00:00:66 (Computer 2)
So you see, the IP address layer (layer 3) addresses stay the same across the entire communication, but the datalink layer (layer 2) addresses change each time another device that processes a layer3 address is involved.
I hope this helps. If it's still confusing feel free to message back and I'll try to explain the specific subset that you're finding challenging.
How could the paraphrased above be true? I have configured a fair bit of household grade network hardware and it does not appear to be this way in reality. In your router you have both IP and Mac filtering and routing options.
That's not really surprising nor is a problem in any way.
First, although IP routers are described as "layer 3" devices, that doesn't mean they cannot interact with lower layers – they do usually see the whole packet, with both its Ethernet and IP headers, and a firewall rule could perfectly well match on either or both.
Second, I'm going to repeat that your household-grade network hardware tends to have multiple functionalities – the main CPU runs the OS and handles routing; the hardware switch handles layer-2 packet forwarding between the 'LAN' ports; and the Wi-Fi access point handles, well, Wi-Fi. It's entirely possible for the same OS to be able to configure both the routing core and the attached switching & Wi-Fi hardware.
(In fact I would bet that the MAC filtering option is specifically for the Wi-Fi access point – these can allow or deny layer-1 WLAN associations based on the station's MAC. Though I'm not sure whether that's usually enforced by the Wi-Fi AP chip itself, or by hostapd running on the main OS...)
Then there's what people call "layer-3 switches", which can act as switches or routers depending on needs – each individual port is reconfigurable, so you could have some ports switched (thus belonging to the same subnet), the rest routed, and the OS reconfigures the switch chip as necessary.
Also when using VM software your physical network card goes into promiscuous mode where it receives packets sent to multiple IP addresses and passes the correct ones to the VM and to the real machine.
Yes, that's not a problem either. There is nothing that would prevent a PC from becoming an IP router or a bridge, or a combination thereof. Most VM software can work in both modes – either bridge the VMs to LAN at layer 2, or create a separate subnet for them so that the PC acts as a router between the two.
(In this regard PCs can get really flexible – just yesterday I decomissioned a "brouter" that was set up as a bridge except when it came to IPv4 packets, which were routed instead...)
Surely IP version four would function even if the Mac address was some how withheld.
Really, it's not IP that needs L2 addressing – it's the layer 2 itself that does.
Yes, it would certainly be possible to design a network which only cared about IP addresses and used those for switching as well. In fact, I think that's exactly how ATM networks worked – an ATM "switch" would essentially act as a self-configuring router, but also automatically learned which individual ATM addresses were behind each port (as a switch would).
But in practice IP was designed to not have hard dependencies on any particular sort of link layer, and as a result you can carry it over anything – Ethernet, FDDI, ARCnet, FireWire, carrier pigeons… Likewise, because most link layers had their own addressing avoided any dependencies on IPv4, one didn't need to do anything to have the same switches support IPv6 (or for that matter IPX, or DECnet, or AppleTalk, or NetBEUI, …) all over the same Ethernet.
So the reason you have both kinds of addresses is that they were deliberately kept separate, and this allowed for great flexibility.
(Both IPv4 and IPv6 can also function over point-to-point links without needing any L2 addressing, since such links only have two directions anyway; two simple examples would be VPN and dial-up connections.)
Actually, while this has nothing to do with subnet masks, you could take a look at IPX and DECnet – both common LAN protocols in the early days before IP and Internet took over. IPX addresses had two parts, network and host, e.g. 618A1.0060086DD3EE
, and the host part was always the same as the corresponding Ethernet MAC address. Meanwhile, DECnet did the opposite – it required changing the Ethernet MAC to a special address in which the DECnet node address was encoded. So on the one hand you didn't need ARP, but on the other hand you were pretty much required to use Ethernet or something compatible with it.
Best Answer
My suggestion, download and capture some data with Wireshark. Make sure you have the packet list, packet details and packet bytes view options enabled and start click on packets. in the packet detail section, you can click on the L2, L3, and L4 sections of the packet and it will highlight the bytes that correlate to whatever you have selected.
Then start by doing some searching online to learn about ethernet headers/encapsulation, IP headers, TCP headers and the like. Wikipedia is often a good jumping off point for topics like this but there are hundreds of resources online. I did a quick search looking for an image that represents how the ethernet frame is ultimately built and found this one that is pretty good: http://www.tcpipguide.com/free/t_IPDatagramEncapsulation.htm
Haven't read the content, but between resources like that and starting to play with the parts of the frame in a tool like Wireshark, you will find there is a definite structure to the binary data and it becomes fairly easy to tell apart.