Can I fit a 12GB tensorflow model in the mac air m1 chip with 16GB Unified RAM

apple-silicongpu

I've read that the M1 chip uses a 'unified memory' architecture, where both the CPU and the GPU share RAM.

Is this equivalent to the VRAM on a traditional NVIDIA GPU?

E.g. If I have a 12GB tensorflow model, can I load this model into the 16GB RAM space on my M1 Chip?

Edit: In Addition to the answer below, I have tested out yolov4
from this repo: https://github.com/hunglc007/tensorflow-yolov4-tflite

 python detect.py --weights ./checkpoints/yolov4-416 --size 416 --model yolov4 --image ./data/kite.jpg

Init Plugin
Init Graph Optimizer
Init Kernel
Metal device set to: Apple M1

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

which shows the 5.33 GB VRAM model has a total of ~16GB to work with

Best Answer

Apple's "Unified Memory Architecture" (UMA) is not exactly the same as what you're used with "VRAM" on a traditional Intel system with for example an NVIDIA GPU.

The UMA on Apple's M1 chip means that the CPU and GPU accesses the same main memory (system RAM). They access all of it in the same manner, and there's no partitions or similar that prevent either the CPU or the GPU from accessing each other's memory. This means that sending information from the CPU to the GPU, or vice versa, can happens just by reading/writing memory - as opposed to having to transfer data via some secondary means or via special instructions.

Intel system feature something called Dynamic Video Memory Technology (DVMT), which is actually part of what Intel has named their "Unified Memory Architecture". Even though the name is the same, it is not identical.

Usually on Intel systems that share ordinary system RAM between the CPU and the GPU, you'll see that a certain amount of RAM is pre-allocated to the GPU early during bootup. This amount of "pre-allocated memory" is either fixed by the hardware integrator, or it is user customisable via BIOS settings or a UEFI menu. The pre-allocated RAM is not visible by operating systems running on the PCU. This means that even if it attempted to do so and even though it is just ordinary system RAM, the operating system cannot access the pre-allocated memory set aside for the GPU.

A little later in the boot sequence, the operating system has the option of setting aside so called "fixed memory" for the GPU. This memory is permanently allocated to the GPU, and is then no longer accessible by the operating system. However the address space is visible to the operating system in terms of pages tables, and as such it is included in the total amount of system RAM known to the operating system - contrary to the pre-allocated memory.

Later on in the bootup, the GPU driver running in the operating system can use DVMT to dynamically allocate more of system RAM to be used as graphics memory by the GPU. Uniquely this type of allocation can be retracted so that the memory region can be used by operating system for applications again.

Note that it is possible to combine these three types of video memory allocations on an Intel system. Note also that it differs from the M1 by the fact that memory is either allocated to the GPU or the CPU - it is not so that for example pre-allocated memory could be shared by the operating system and the GPU for their communication. As soon as the memory is given to the GPU, the programs running on the CPU loose access to it.

As for you last question regarding the 12 GB Tensorflow model. Yes, in theory this model can be loaded into the 16 GB of system RAM on the M1. In practice you might run into other things blocking you or slowing down the process. For example a model loaded into RAM usually takes up more space than its on-disk representation.