MacOS – How to stop mDNSResponder from using 90-100% CPU continuously forever on Catalina

catalinacpudnsmacos

I just upgraded my 2018 15" MacBook Pro from Mojave to Catalina (10.15.4). It's been a few hours.

One of the first things I did after the upgrade was edit a video using the new free trial of Final Cut Pro X. My laptop's cooling fans ran full-speed the entire time, but there was always background rendering going on, so I figured that was normal.

When I finished and quit FCP, the fans didn't spin down, so I checked Activity Monitor and discovered that mDNSResponder is taking 90-100% CPU continuously. The Threads column in Activity Monitor indicates 3-4 threads most of the time; the 100% is spread across all of those, and they're not all on the same core. I'm not sure how it's managing to do that and still sit at or just under 100% most of the time, but that's what it's doing.

screenshot of Activity Monitor

The laptop has six cores (12 logical), so having one core fully occupied does not make a noticeable difference in performance (unless I start measuring how long things take — but that's noticing that the numbers are different — not that the performance is different!).

Note: In the aggregate, the bar graphs show more than one full core being utilized. This is expected. I have a search applied in my Activity Monitor screenshot, and there's lots of other stuff going on — Slack is open, Chrome with eleventybillion tabs, IntelliJ IDEA is probably indexing something, and so on.

I tried rebooting mDNSResponder using these commands:

sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.mDNSResponder.plist
sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.mDNSResponder.plist

I watched the process disappear, so I know the command worked, but it immediately returned to 100% CPU usage when I started it back up. mDNSResponderHelper did not stop, so I tried again, inserting sudo killall mDNSResponderHelper as an intermediate step. This made both processes go away as I intended, but still didn't fix the problem.

I also tried sending a HUP signal to mDNSResponder as follows:

sudo killall -HUP mDNSResponder

This also had no effect.

I opened up Console, entered mdnsresponder in the search field, and watched the messages flow by for a minute or two. Some stuff about Bonjour, LOTS of <private>, and some pretty normal-looking DNS query logging. I tried disabling both Bluetooth and Wifi in hopes of affecting Bonjour, but I'm on a hardwired Ethernet connection (which I did not disconnect) and it did not seem to have any effect.

After typing this, I eventually noticed that cloudphotosd was also taking up a healthy chunk of CPU. I assumed that this was the notorious reindexing process that frequently happens after OS upgrades, going through my (quite large) photo library, updating metadata based on whatever new features came with Catalina, and uploading those changes to iCloud. That would explain some constant network activity, and so I thought maybe that would explain mDNSResponder's activity. So I left this window open without submitting and waited a while to see if cloudphotosd would calm down. It did, but mDNSResponder did not. So much for that hunch!

Finally, I tried rebooting my Mac; mDNSResponder wasted no time getting back to business. With no apps running after a fresh boot, it was already consistently sitting at or just under 100%, just like before.

This is a Q&A site, and I haven't asked a question, so here goes: how do I figure out what it is doing, and how do I make it stop?

UPDATE: it's been nearly 48 hours and it's still churning away. My battery life sucks now. I've observed that closing the laptop lid does seem to make it stop, but it comes right back when I open it again. I've also noticed an additional symptom: first-time DNS lookups after a reboot take ~2 seconds (I'd expect <200ms). I'm not sure if that's simply a side effect of mDNSResponder being so busy doing whatever it's doing or if it is related to the cause.

UPDATE 2: it's been more than three weeks. I've added a 100-rep bounty. The DNS lookup delay has increased; it often takes 20-30 seconds, and while there does seem to be some caching in place, I think it has a time-based expiry on it, because the delay reoccurs later without a restart. I'm happy to interact directly with someone knowledgeable enough to debug and diagnose this issue. I'm on Eastern Daylight Time in the United States (UTC-4) and generally available during business hours.

Best Answer

Here's my recommendation:

  1. Let's see what mDNSResponder is actually doing. Here's a utility to turn off/on censorship behind the <private>label. Make sure to turn it back on once you're done. You may find something like the process is hung up on something and just loops continually, or something like that. https://georgegarside.com/blog/macos/sierra-console-private/

  2. Get a packet capture of your network as you make a DNS request. Grab Wireshark, start a capture on the interface you're using (be it ethernet or WiFi; but make sure the one(s) you're not using are off/unplugged). I would do this first in the environment where it takes 20-30s, and then again after a reboot when the conditions are such that it only takes 2-3s. The less you can have using the network the better as you run these packet captures, as they'll get big fast. This should show us the DNS request as well as requests to and from the websites themselves, so we can see where the delays are. If there aren't delays in the network, then we'd look at processes instead.

  3. Upload relevant parts of the logs and/or the packet capture(s) somewhere for us to look at. Make sure you censor or remove any private data.

  4. And, finally, note that this may be solved faster by doing an in-place OS reinstall. That may oppose your views on being able to fix your equipment, knowing what goes on with your equipment, etc., but if the goal is to get mDNSResolver back to sanity as quickly as possible, an in-place OS reinstall may be the quickest way there.

EDIT: I was able to re-create the issue and capture the related network traffic. Many section of RFC 6762 (Multicast DNS) seem relevant - I won't post excerpts here, but specifically I found parts of sections 3, 5, 5.2 and 10.2, to be highly relevant.

Here's what I think is happening.

Upon creating these lo0 aliases, traffic is constantly generated with a "cache flush" flag. The RFC doesn't get into enough detail about it for me to be sure, but it seems like each of the loopback addresses are advertising themselves as being the address that can respond to queries made to the hostname of the machine, and therefore listening devices should flush their internal caches and update them with the new IP address.

Think about it, if the network thinks you're hostname.local at 192.168.44.111 and your IP address changes, mDNS will blast out a "flush your caches, hostname.local is now 192.168.44.123!" message on 224.0.0.251. This is one circumstance where mDNS will proactively advertise a new IP, and it's the reason network browsing works so well - i.e., networked printers as per the RFC.

What doesn't make a ton of sense to me, is there are parts of the RFC that have me think multiple active loopback addresses on the same machine wouldn't be spamming the way they are - but then I may not be understanding the RFC well. Either way, it seems clear to me that the mDNSResponder process is looping through each loopback interface and telling everyone on 224.0.0.251 to disregard the last guy who took charge, this is the new IP address assigned to my hostname!

I'm not exactly clear on why this slows down regular DNS queries, except if mDNSResponder is responsible for sending out and receiving the regular DNS queries, well, it's tied up in all this nonsense with the other interfaces. And/or, perhaps the DNS queries would go out and come back in on whichever interface has most recently taken charge of the hostname. That, I could see really causing delays, because in the time a DNS request over WAN comes back, the responding loopback interface would be different than the one it went out from. But I'm just spitballing wildly at this point.

Further, instead of having to restart, you can change your script slightly. If ifconfig lo0 alias "$ADDR" up was used to bring up a new interface alias, then ifconfig lo0 -alias "$ADDR" can be used to bring it down.