MacOS – How to avoid data mangling in filenames when copying files between Linux and macOS

filesystemhfs+macosunicode

When we transported a CMS filesystem and mysql dump via rsync from one Linux server to another Linux server, we used a MacOS in the middle to store the data temporarily. After the CMS was up on the other Linux server, all static image files where available. But all filenames with German umlauts where not to be found by the CMS, although they where visible and accessible. We quickly found out, that exactly this happens:

When I create a filename with German umlauts on Linux like this:

linux$ mkdir umlauttest
linux$ touch umlauttest/äöü

And then go to a Mac and issue there an rsync to copy the directory to my Mac…

mac$ rsync -a user@linux:umlauttest .

And then copy it back from my Mac to Linux:

mac$ rsync -a umlauttest/. user@linux:umlauttest2

Then I have really a problem on my Linux, because the filename is kind of broken.

linux$ diff umlauttest umlauttest2
Only in umlauttest2: äöü
Only in umlauttest: äöü

That is, because MacOS is in fact converting the filename characters to composite Unicode characters, which is really something I call mangling meta data. This behaviour also arises by using scp to copy files.

Is there a way to prevent this from happening?

Best Answer

Use the --iconv option for rsync to specify how filenames are encoded on the local and remote hosts - this might help you in keeping the filenames intact.

When you copy from the Linux server to the Mac add the following to the rsync command:

--iconv=utf-8,utf-8-mac 

When copying from the Mac to the Linux server use the following:

--iconv=utf-8-mac,utf-8 

Note that you need a relatively new version of rsync (>3.0) to have the --iconv option. Apple supplies only an old version so you need to go to another site e.g. a package manager like Macports or Homebrew