I have a notebook here that I suspect has a faulty memory module. I therefore downloaded Memtest86+ and let it run.
Note that the screenshot is not my actual one, it's provided by memtest86+
How do I interpret the numbers on the screen? I've let it run for about four hours and now I'm in pass 7.
Especially, what does
- the test number
- the count of Errors
- the count of ECC errors
indicate? What are sane values for memory errors? At which point should I consider replacing memory?
Best Answer
TL;DR
The most important number first: The error count for healthy memory should be 0. Any number above 0 may indicate damaged/faulty sectors.
Screen explanation
Data/Test explanation
MemTest runs a number of tests, it writes specific patterns to every sector of the memory and retrieves it. If the retrieved data differs from the data that was originally stored, MemTest registers an error and increases the error count by one. Errors are usually signs of bad RAM strips.
Since memory isn't just a notepad that holds information but has advanced functions like caching, several different tests are done. This is what the
Test #
indicates. MemTest runs a number of different tests to see if errors occur.Some (simplified) test examples:
More detailed description of all tests from: https://www.memtest86.com/technical.htm#detailed
Because bad sectors may sometimes work and not work another time, I recommend letting MemTest run a few passes. A full pass is a completed test series that have passed. (The above test series 1-11) The more passes you get without errors, the more accurate your MemTest run. I usually run around 5 passes to be sure.
The error count for healthy memory should be 0. Any number above 0 may indicate damaged/faulty sectors.
ECC error count should only be taken into account when
ECC
is set tooff
. ECC stands for Error-correcting code memory and it's a mechanism to detect and correct wrong bits in a memory state. It can be compared slightly to the parity checks done on RAID or optical media. This technology is quite expensive and will likely only be encountered in server set-ups. The ECC count counts how many errors have been corrected by the memory's ECC mechanism. ECC shouldn't have to be invoked for healthy RAM, so an ECC error count above 0 may also indicate bad memory.Error explanation
Example of Memtest that has encountered errors. It shows which sector/address has failed.
The first column (Tst) shows which test has failed, the number corresponds to the test number from the list already mentioned above. The second column (Pass) shows if that test has passed. In the case of the example, test 7 has no passes.
The third column (Failing Address) shows exactly which part of the memory has errors. Such a part has an address, much like an IP address, which is unique for that piece of data storage. It shows which address failed and how big the data chunk is. (0.8MB in the example)
The fourth (Good) and fifth (Bad) columns show the data that was written and what was retrieved respectively. Both columns should be equal in non-faulty memory (obviously).
The sixth column (Err-Bits) shows the position of the exact bits that are failing.
The seventh column (Count) shows the number of consecutive errors with the same address and failing bits.
Finally, the last, column seven (Chan) shows the channel (if multiple channels are used on the system) which the memory strip is in.
If it finds errors
If MemTest discovers any errors, the best method of determining which module is faulty is covered in this Super User question and its accepted answer: