Memtest86+ – How to Interpret Memtest Run Statistics

memorymemtest86+

I have a notebook here that I suspect has a faulty memory module. I therefore downloaded Memtest86+ and let it run.

Note that the screenshot is not my actual one, it's provided by memtest86+

memtest

How do I interpret the numbers on the screen? I've let it run for about four hours and now I'm in pass 7.

Especially, what does

  • the test number
  • the count of Errors
  • the count of ECC errors

indicate? What are sane values for memory errors? At which point should I consider replacing memory?

Best Answer

TL;DR

The most important number first: The error count for healthy memory should be 0. Any number above 0 may indicate damaged/faulty sectors.


Screen explanation

     Memtest86+ v1.00      | Progress of the entire pass (test series)
CPU MODEL and clock speed  | Progress of individual, current test
Level 1 cache size & speed | Test type that is currently running
Level 2 cache size & speed | Part of the RAM (sector) that is being tested
RAM size and testing speed | Pattern that is being written to the sector
Information about the chipset that your mainboard uses
Information about your RAM set-up, clock speed, channel settings, etc.

WallTime   Cached  RsvdMem   MemMap   Cache  ECC  Test  Pass  Errors  ECC Errs
---------  ------  -------  --------  -----  ---  ----  ----  ------  --------
Elapsed    Amount  Amount    Mapping  on     on   Test  # of  # of    # of ECC
time       of RAM  of        used     or     or   type  pass  errors  errors
           cached  reserved           off    off        done  found   found
                   RAM, not
                   tested

Data/Test explanation

MemTest runs a number of tests, it writes specific patterns to every sector of the memory and retrieves it. If the retrieved data differs from the data that was originally stored, MemTest registers an error and increases the error count by one. Errors are usually signs of bad RAM strips.

Since memory isn't just a notepad that holds information but has advanced functions like caching, several different tests are done. This is what the Test # indicates. MemTest runs a number of different tests to see if errors occur.

Some (simplified) test examples:

  • Test sectors in this order: A, B, C, D, E, F. (Serial)
  • Test sectors in this order: A, C, E, B, D, F. (Moving)
  • Fill all sectors with pattern: aaaaaaaa
  • Fill all sectors with a random pattern.

More detailed description of all tests from: https://www.memtest86.com/technical.htm#detailed

Test 0 [Address test, walking ones, no cache]

Tests all address bits in all memory banks by using a walking ones address pattern.

Test 1 [Address test, own address, Sequential]

Each address is written with its own address and then is checked for consistency. In theory previous tests should have caught any memory addressing problems. This test should catch any addressing errors that somehow were not previously detected. This test is done sequentially with each available CPU.

Test 2 [Address test, own address, Parallel]

Same as test 1 but the testing is done in parallel using all CPUs and using overlapping addresses.

Test 3 [Moving inversions, ones&zeros, Sequential]

This test uses the moving inversions algorithm with patterns of all ones and zeros. Cache is enabled even though it interferes to some degree with the test algorithm. With cache enabled this test does not take long and should quickly find all "hard" errors and some more subtle errors. This test is only a quick check. This test is done sequentially with each available CPU.

Test 4 [Moving inversions, ones&zeros, Parallel]

Same as test 3 but the testing is done in parallel using all CPUs.

Test 5 [Moving inversions, 8 bit pat]

This is the same as test 4 but uses a 8 bit wide pattern of "walking" ones and zeros. This test will better detect subtle errors in "wide" memory chips.

Test 6 [Moving inversions, random pattern]

Test 6 uses the same algorithm as test 4 but the data pattern is a random number and it's complement. This test is particularly effective in finding difficult to detect data sensitive errors. The random number sequence is different with each pass so multiple passes increase effectiveness.

Test 7 [Block move, 64 moves]

This test stresses memory by using block move (movsl) instructions and is based on Robert Redelmeier's burnBX test. Memory is initialized with shifting patterns that are inverted every 8 bytes. Then 4mb blocks of memory are moved around using the movsl instruction. After the moves are completed the data patterns are checked. Because the data is checked only after the memory moves are completed it is not possible to know where the error occurred. The addresses reported are only for where the bad pattern was found. Since the moves are constrained to a 8mb segment of memory the failing address will always be less than 8mb away from the reported address. Errors from this test are not used to calculate BadRAM patterns.

Test 8 [Moving inversions, 32 bit pat]

This is a variation of the moving inversions algorithm that shifts the data pattern left one bit for each successive address. The starting bit position is shifted left for each pass. To use all possible data patterns 32 passes are required. This test is quite effective at detecting data sensitive errors but the execution time is long.

Test 9 [Random number sequence]

This test writes a series of random numbers into memory. By resetting the seed for the random number the same sequence of number can be created for a reference. The initial pattern is checked and then complemented and checked again on the next pass. However, unlike the moving inversions test writing and checking can only be done in the forward direction.

Test 10 [Modulo 20, ones&zeros]

Using the Modulo-X algorithm should uncover errors that are not detected by moving inversions due to cache and buffering interference with the the algorithm. As with test one only ones and zeros are used for data patterns.

Test 11 [Bit fade test, 90 min, 2 patterns]

The bit fade test initializes all of memory with a pattern and then sleeps for 5 minutes. Then memory is examined to see if any memory bits have changed. All ones and all zero patterns are used.

Because bad sectors may sometimes work and not work another time, I recommend letting MemTest run a few passes. A full pass is a completed test series that have passed. (The above test series 1-11) The more passes you get without errors, the more accurate your MemTest run. I usually run around 5 passes to be sure.

The error count for healthy memory should be 0. Any number above 0 may indicate damaged/faulty sectors.

ECC error count should only be taken into account when ECC is set to off. ECC stands for Error-correcting code memory and it's a mechanism to detect and correct wrong bits in a memory state. It can be compared slightly to the parity checks done on RAID or optical media. This technology is quite expensive and will likely only be encountered in server set-ups. The ECC count counts how many errors have been corrected by the memory's ECC mechanism. ECC shouldn't have to be invoked for healthy RAM, so an ECC error count above 0 may also indicate bad memory.


Error explanation

Example of Memtest that has encountered errors. It shows which sector/address has failed.

Memtest screen with errors

The first column (Tst) shows which test has failed, the number corresponds to the test number from the list already mentioned above. The second column (Pass) shows if that test has passed. In the case of the example, test 7 has no passes.

The third column (Failing Address) shows exactly which part of the memory has errors. Such a part has an address, much like an IP address, which is unique for that piece of data storage. It shows which address failed and how big the data chunk is. (0.8MB in the example)

The fourth (Good) and fifth (Bad) columns show the data that was written and what was retrieved respectively. Both columns should be equal in non-faulty memory (obviously).

The sixth column (Err-Bits) shows the position of the exact bits that are failing.

The seventh column (Count) shows the number of consecutive errors with the same address and failing bits.

Finally, the last, column seven (Chan) shows the channel (if multiple channels are used on the system) which the memory strip is in.


If it finds errors

If MemTest discovers any errors, the best method of determining which module is faulty is covered in this Super User question and its accepted answer:

Use the process of elimination -- remove half of the modules and run the test again...

If there are no failures, then you know that these two modules are good, so put them aside and test again.

If there are failures, then cut down to half again (down to one of four memory modules now) then test again.

But, just because one failed a test, don't assume that the other doesn't fail (you could have two failing memory modules) -- where you've detected a failure with two memory modules, test each of those two separately afterwards.

Important note: With features like memory interleaving, and poor memory module socket numbering schemes by some motherboard vendors, it can be difficult to know which module is represented by a given address.

Related Question