Linux – How to tell whether RAM ECC is working

ecclinux-kernelram

I'm planning on getting some ECC RAM to replace the non-ECC RAM I currently have installed on my Asus M5A97 Pro motherboard (AMD 970 chipset, FX-6100 CPU).

After I install the RAM, how do I tell whether the ECC feature of the RAM is working properly?

I thought about dmidecode --type memory which currently prints among else for each RAM stick:

Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits

(For one, I would expect with 1 bit of ECC per byte the data width to remain 64 bits but the total width to read 72 bits.)

Can that be used for determining whether ECC is operative? Or is dmidecode too low level for that? What else could I use (except waiting and seeing if an ECC error shows up in the logs, which would indicate it's working but not that it isn't working)?

Update: I later thought of edac-utils. Installing them, I get Not enabling Memory Error Detection and Correction since EDAC_DRIVER is not set. That gave me edac-util and edac-ctl executables. Can one of those be used for this purpose?

Best Answer

It appears that there is no surefire way to tell, however various approaches can get you some sort of answer. Apparently you pretty much have to try the different ones until you find one that tells you ECC is working.

In my case memtest86+ 4.20 couldn't be coaxed into realizing it was dealing with ECC RAM; even if I configured it for ECC On, it still reported ECC: Disabled on the IMC line. I haven't yet tried with a newer version. However (possibly after installing edac-utils, unfortunately I did both essentially at the same time), Linux reports in the boot logs (interspersed with some other entries):

[    4.867198] EDAC MC: Ver: 2.1.0
...
[    4.874374] MCE: In-kernel MCE decoding enabled.
[    4.875414] AMD64 EDAC driver v3.4.0
[    4.875438] EDAC amd64: DRAM ECC enabled.
...
[    4.875542] EDAC amd64: CS0: Unbuffered DDR3 RAM
[    4.875545] EDAC amd64: CS1: Unbuffered DDR3 RAM
[    4.875546] EDAC amd64: CS2: Unbuffered DDR3 RAM
[    4.875548] EDAC amd64: CS3: Unbuffered DDR3 RAM

which is a pretty good indication. Manually doing /etc/init.d/edac restart does not create similar log entries, and looking at an older log from a few reboots ago, I see:

[   13.886688] EDAC MC: Ver: 2.1.0
[   13.890389] MCE: In-kernel MCE decoding enabled.
[   13.891082] AMD64 EDAC driver v3.4.0
[   13.891107] EDAC amd64: DRAM ECC disabled.
[   13.891116] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
[   13.891117]  Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
[   13.891118]  (Note that use of the override may cause unknown side effects.)

dmidecode --type memory also gives two pretty strong indications: the physical memory array's "error correction type" property (which however for some reason showed the same on non-ECC RAM, so this may be related to the motherboard's support rather than the memory's capabilities),

Handle 0x0026, DMI type 16, 23 bytes
Physical Memory Array
    Location: System Board Or Motherboard
    Use: System Memory
    Error Correction Type: Multi-bit ECC

and each memory device's total width and data width, respectively (the additional bits being those used for the ECC):

Handle 0x0028, DMI type 17, 34 bytes
Memory Device
    Array Handle: 0x0026
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits