Why did deadbeefdeadbeef become deadbeeedeadbeef

kernel-panicmemory

My computer had a sudden shutdown today, and part of the associated report is found below:

** Panic Report *** panic(cpu 4 caller 0xffffff800a4bade0): "a freed zone element has been modified in zone kalloc.256: expected
0xdeadbeefdeadbeef but found 0xdeadbeeedeadbeef, bits changed
0x100000000, at offset 120 of 256 in element 0xffffff805d4c7d00,
cookies 0x3f00113ca62d16fc

Originally I thought maybe this was a virus of some kind, since the number deadbeefdeadbeef looked like it wasn't a random memory address. I Googled the term deadbeef and found that it's a standard magic number.

My question is, what could have caused such a memory corruption as this?

Best Answer

The error message you have got is directly from the operating system kernel - warning of unexpected memory corruption. This is obviously not supposed to happen. 0xdeadbeef is indeed a common "magic number", but it appears here for a very specific reason.

Specifically this error messages comes from zalloc (a zone-based memory allocator) within the kernel. The kernel is the part of the operating system that is run in "privileged mode", meaning roughly that it has full access to the computer. Your own applications are run in a limited mode that gives it less access to the computer. As part of its functioning the kernel uses RAM memory to store and process information. In order to keep track of which parts of RAM is actually used the kernel uses many different allocators. The zone-based memory allocator is special in that it is used by the kernel itself only (i.e. not for applications) and that it allocates only RAM blocks of specific, fixed sizes. For example it might allocate blocks of 256, 512 and 1024 bytes - but it won't allocate a block of 300 bytes. This is done to make the allocator simpler and thus faster.

The cookie part of the error message refers to a security related feature, where pointers (memory addresses) in the kernel are "poisoned" by combining (with XOR) them with a cookie. The cookie is simply a random number picked when you boot your computer. This ensures that these memory addresses seem random, and are not predictable by virus/malware that exploit bugs to get access to privileged resources.

One specific functionality of the zone based allocator is that when the kernel frees the memory (i.e. states that it no longer has use for an allocated piece of memory), it doesn't just note that it is free for other use - but it also overwrites the memory pointer with the magic number 0xdeadbeef. This serves two purposes: (1) if the kernel erroneously use the memory after it has freed it, following the pointer wouldn't "happen to work" or "happen to point to other data which will be corrupted" - but rather trigger a system error interrupt as the address 0xdeadbeef is mapped to ensure that happens, (2) if the allocator later is asked for a new allocations and wants to hand out the same block again, it can detect if it was changed, meaning that there's a bug or hardware error somewhere.

In your case the error message indicates that (2) happened - so something altered some of the RAM in your computer that the kernel thought was marked as not in use. The cause of such a memory corruption is most often either a kernel bug, third party kernel extension bug or hardware error.

If the problem is a kernel bug, you probably have no way to fix this except to either wait for a new update from Apple, or downgrade to an earlier version of macOS. If this happens once in a blue moon, then just wait for the next update. If it happens frequently, you can try to figure out exactly what you did to trigger the bug, and avoid doing that in the future - or downgrade to an earlier version of macOS that hasn't got the bug.

If the problem is a third party kernel extension bug, I would check for updates to all of the kernel extensions you have installed. Kernel extensions are usually installed as drivers for third party hardware, or for special applications that change the way the system works at a low level. This could be for example virtualization software (such as VMware) or software such as Little Snitch or similar. It is always a good idea to disconnect any third party hardware and not start up such low-level applications and check if the bug disappears - then reconnect one thing at a time until you have the problem.

If the problem is hardware error, you really have no choice but to get the hardware replaced. Usually errors like this are caused by defective RAM modules. It could also be a thermal or logic board problem, but it is not usually the case. Use the Apple Hardware Test to check if your RAM modules have errors.

In the extreme case this error could be caused by a virus, but I would definitely look for other reasons first. A virus or malware usually tries to keep its existence hidden, so it wouldn't be triggering error messages such as these (and causing system lockups). However it is possibly that a bug in a virus triggers this error messages, similar to bugs in non-malicious kernel extensions.