Why Failing Addresses in Memtest86+ Are Higher Than Total Memory

hardware-failurememorymemtest86+

Context

Feel free to skip this section. I ramble a bit.

This is a desktop computer. My motherboard is a MSI P67A-C45. The two memory modules are 4GB DDR3-1333 in DIMM 1 and DIMM 3, dual channel.

A few months ago, I started experiencing random graphics glitches resulting in my video drivers (nVidia) crashing and restarting occasionally. This was especially bad when hardware acceleration was used, especially in 3D FPS games, even with the 10+ year old goldsrc engine. It would also happen when playing flash videos, and occasionally when doing nothing. Most of the time, it was fine. It was only after the system had been up for a while – uptime, time since last reboot, not hibernation. Once the glitches started, I had to turn the computer off and leave it off for a few minutes. I suspected the video card – perhaps overheating? But temperature monitoring programs reported GPU temp at a nice cool 40 degrees Celsius.

More recently (last week or two), when leaving the computer on overnight, I wake up to find it had BSoD'd, with a memory related error. I'm currently rerunning Memtest86+, so I can't dig up the exact error message/codes, if anyone really wants them.

At the same time, some programs started crashing randomly ("xxxx has stopped working." with a close button). This would happen to Firefox and the aforementioned FPS game. I don't really run anything else, and the crashes are random. That is, they could crash immediately or run fine for the whole time I use them (several hours). The troubleshooter's nightmare.

Memtest86+

On the first run, somewhere over 12 hours, I got the following results:

Photo of results
Click for full size

There's a few irregularities. Firstly, I have 2x 4GB DDR3-1333 modules in DIMM1 and DIMM3, dual channel. This is reporting DDR3-8247, whatever that is. Secondly, all the failing addresses are outside my total RAM capacity. That's not all that much help when trying to figure out which module may be failing. Evidently, at least one is failing.

I reseated the modules and tried again:

Photo of results
Click for full size

As you can see, the frequencies and latencies are completely different. The latency values are much closer to what I vaguely recall seeing in CPU-Z (or was that HWiNFO32?). This test has only just started, so it's entirely possible those values changed sometime during the test.

Also, considering errors only started cropping up in the later passes, is it possible that this is an overheating issue? Consider I've been using this computer for about a year now, and only in the last three or so months have things happened.

My main question remains: Why is the failing addresses higher than my capacity?

Best Answer

Some hardware devices need memory space with physical addresses below 4GB for 32-bit DMA. So a large chunk of address space under 4GB is reserved for those mappings. The RAM that would normally land in that space is remapped at the current end of physical memory.

I suspect heat may well be an issue.

Related Question