Excel is unusable for large files that link to each other

cpumemoryms officeparallels-desktop

I am collaborating with a data analyst using two linked files stored in a shared folder in my Dropbox. The first file contains the raw data (several worksheets; total file size ~110MB); the second file contains the outputs (tables; pivots; slicers; graphs; 10MB total file size) and is linked to the raw data file (the raw data file is the source file).

We are both Mac users. He is using a MacBook Pro which I think is running a dual core i5 and SSD. My primary computer is my iMac (late 2012 build; quad core i7 3.1ghz; 1TB Fusion Drive; 16GB RAM; Catalina). I have used stress test software and the Fusion Drive appears to be fine : ~230 MB/s write; ~450 MB/s read.

Although the raw data file is large, the data analyst can open the file and edit it without significant performance issues.

My iMac will open the raw data file using Excel for Mac (64 Bit version), however, Activity Monitor shows Excel running the CPU at ~200% even as 'rest', i.e. no calculations being performed.

As soon as I edit the file in any way, even something simple such as inserting a row in a worksheet, the CPU usage increases to ~775% and the data read rate drops to <1MB/s.

Excel then hangs and is unusable.

I've tried Excel for Mac, Excel for Windows in Parallels using Windows Virtual Machine, and Excel for Windows in Boot Camp. I've also tried working with the Excel file using my MacBook Pro (mid-2013; 2.5Ghz Core i5; 8GB RAM; 500GB SSD; Mojave) using Excel for Mac and Excel for Windows in Boot Camp.

The problem persists in each of these environments.

I've spent around 6 hours on tech support calls with Apple, Microsoft and Parallels but no progress has been made.

I've repaired Home Permissions (no improvement) completely uninstalled Office suite and installed 64 Bit versions of the Office suite (no improvement).

If the file was corrupt I'd expect the data analyst to be experiencing significant issues with the file, but he isn't. This seems to point towards this being a local problem, i.e. my system.

I'm stumped, especially regarding the very high CPU usage by Excel and the almost zero data transfer rate.

We've run Excel files approaching 300MB in the past, using client Windows computers (nothing particularly advanced in terms of specs) and even though the files took (say) a minute to load they were usable.

Any help would be hugely appreciated.

EDIT/ UPDATE 2020-08-12 :

Apologies to all that took the time to add comments/suggestions – several weeks have passed without an update from me.

In short, there are two issues : collaborating using linked Excel files in Dropbox, and slow performance in an Excel For Mac environment.

It became clear that Excel for Mac simply cannot handle linked files which are not saved on a local drive. The data analyst and I have had to revert to a single integrated file, which is working but of course increases the size of the file considerably.

The speed issues are a different matter and we still haven't resolved those. Despite the data analyst having a 'lesser' spec MacBook Pro (2019; 1.4Ghz quad core i5; 8GB RAM) than my iMac (2012; 3.1Ghz quad core i7; 16GB RAM) his machine is able to operate the data file far more quickly.

Best Answer

At that level of data, it’s often easy to kill Excel whether you’re on Windows 10 or macOS.

  • Plain data should be fine for hundreds of thousands of rows and less than 30 columns for most data that’s not UUID length.
  • What seems to be most finicky is calculations, graphs, conditional formatting and all the processing that happens to the data and not simply the file of 100 or even 500 MB on disk.

The code base for 2020 Excel is the same on Mac and Windows with just some features selectively enabled for each platform. This was unified in the past 16 months, so you would need to simplify your calculations by troubleshooting your spreadsheets or perhaps move to an engine that’s less likely to get caught up like python or R. The main difference on Mac vs Windows is major feature and UI code for each platform, not the core data handling.

Depending on your linking, you might also just need a database which can handle indexing and optimization better than Excel can.

These are not at all trivial suggestions, so I know you’d prefer to not fix Excel, but it has so many layers of features, that you can stray away from performance and end up with a very fragile workbook in practice. There’s only so much stripping of charts and data you can do in some cases when you overload Excel with code and graphs and conditional calculations or worse, looping conditions that it cannot self detect or alert you.