How does Rosetta 2 work

m1rosettarosetta-2

I'd love to understand more about how Rosetta 2 works. The Apple Developer article is brief. Has anyone done a deep analysis on how Rosetta 2 works, how it is invoked and whether it's possible to use it via an API?

Some questions:

  • How are x86_64 applications launched under Rosetta?
  • Is it possible to dynamically invoke translation for a portion of x86 instructions?
  • Might it be possible to bridge Rosetta to QEMU or similar to allow fast virtualization of Intel Docker images?

Best Answer

Rosetta 2 works by doing an ahead-of-time (AOT) translation of the Intel code to corresponding ARM code. It is able to do this efficiently and easily mainly because the M1 CPU contains a special instruction that switches the memory-ordering model observed by the CPU for that thread into a model equivalent to the Intel x86 model (TSO - total store order). This has to do with how programs can expect memory consistency to work when having multiple processors (i.e. cores in this case).

User's can observe the translation the first time they launch an Intel app on the M1 as the first launch is slow. The translated code is cached and used on subsequent, much faster launches.

If you have a binary that is valid for several different architectures, you can specifically invoke Rosetta 2 by specifying that you want to launch the Intel code. You can do that from the terminal like this:

arch -x86_64 ./mycommand

Note that this setting also applies to any program that the "mycommand" process should choose to run.

Rosetta 2 as delivered by Apple in macOS Big Sur is not setup to dynamically invoke translation for a portion of x86 instructions. It is entirely focused on doing an AOT translation of the whole binary in advance. There's no user interface for translating a few small set of instructions on the fly. Rosetta 2 does include a JIT engine that allows translating instructions on the fly (for example if you run an Intel-based browser with a JIT JavaScript engine) - it is however not a general purpose JIT-engine that you could use for other purposes through an API or similar.

If you want to do that for research purposes or just out of "pure interest", then you could just take the instructions you want to translate and add them to a simple application shell (essentially adding them to a simple main()-only C program for example) and run it. The cached, translated version of the program then includes the translated instructions for inspection.

The cache is available in these folders:

/var/db/oah/
/System/Library/dyld/aot_shared_cache

There's no immediate way of "bridging" Rosetta 2 to QEMU to allow fast virtualization of Intel Docker images. QEMU contains its own Intel x86 emulation, so you could get it to run Intel Docker images on the M1 without involving Rosetta 2 at all. In this case, "fast" is a very subjective measure.