Windows – iconv module (to use with rsync) to avoid windows-illegal filenames in local NTFS partition

fusentfsrdiff-backuprsyncwindows

I would like to locally attach an NTFS volume to my unix (Ubuntu) machine, and copy (replicate) some unix directories to it, using rsync, in a way that the result is readable under Windows.

I do not care about ownership and permissions. It would be nice if modification dates would be preserved. I only need directories and files (symbolic links would be nice, too; but not a problem if they cannot be copied).

Two obvious problems are: case (in)sensitivity, and characters that are illegal in Windows filenames. For example, in Linux I can have two files "a" and "A"; I can copy them to the NTFS volume, but in Windows I will be able to access (at most?) one of them. But I am happy to ignore that problem. What I am interested about are illegal characters in Windows filenames, which are <,>,:,",/,\,|,?, and * (well, actually also ascii 0-31, but I do not care about that. There might also be problems with files ending in a "."?).

I would like rsync to automatically "rename", e.g., a file called "a:"
to, say a(COLON), to end up with a legal name
(and, ideally, translate a(COLON) back to a:)

Is this possible to have rsync automatically rename files to avoid characters forbidden in Windows?

  • As far as I understand rsync can use iconv to do such tasks; is there a standard iconv module for windows-filenames? (I briefly looked into programming an own gconv module, but lacking C knowlege this seems too complicated).
  • I have been told that rdiff-backup can do some conversions like that, but the homepage just mentions something being done "automatically", and I am not sure whether a locally mounted NTFS vomlume would trigger a renaming in a reliable way?
  • I am aware that there is fuse-posixovl, but this seems an overkill for my purpose, and also it doesn't seem to be well documented (which characters will be translated in which way? Will all filenames be truncated to 8.3 or whatever? Can I avoid the additional files carrying owner/permission information, which I will not need, etc etc.)
  • I am aware that I could avoid all these problems by using, e.g., a tar file; but this is not what I want. (In particular, I would like in Windows to further replicate from the NTFS volume to another backup partition, copying only the changed files)
  • I am aware of the "windows_names" option when mounting NTFS; but this will prevent creating offending files, not rename them.

Update: As it seems my question was not quite clear, let me give a more explicit example:
For example, WINDOWS-1251 is of no use for me. iconv -f utf-8 -t WINDOWS-1251//TRANSLIT
transforms

123 abc ABC äö &:<!|

into

123 abc ABC ao &:<!|

I would need a codepage, windows-filenams, say (which does not exist), that transforms the string into something like

123 abc ABC äö &(COLON)(LT)!(PIPE)

Update 2: I now gave up and renamed the offending files “by hand'' (i.e., by script). From now on, every time before running rsync, I run a script that checks whether offending filenames exist (but does not automatically deal rename anything); I just use

# find stuff containing forbidden chars
find $MYDIR -regex '.*/[^/]*[<>:*"\\|?][^/]*'
# find stuff containing dot as last character (supposedly bad for windows)
find $MYDIR -regex '.*\.'
# find stuff that is identical case insensitive
find $MYDIR -print0 | sort -z | uniq -diz | tr '\0' '\n'

(the last line is from case-insensitive search of duplicate file-names )

Best Answer

A pragmatic solution would be to reproduce the source directories with the desired converted filenames locally, using hard links to the original files, then rsync this copy as-is to the ntfs filesystem.

For example, this perl script demo duplicates the hierarchy /tmp/a/ into /tmp/b/ and url-encodes (with % and 2 hex digits) the undesirable characters so file:b becomes file%3ab (a hard link) and directory %b<ha> becomes directory %25b%3cha%3e and so on:

#!/usr/bin/perl
use strict;
use File::Find;
my $startdir = '/tmp/a';
my $copydir = '/tmp/b';
sub handlefile{
    my $name = substr($File::Find::name,1);
    my $oldname = $startdir.$name;
    $name =~ s/([;, \t+%&<>:\"\\|?*])/sprintf('%%%02x',ord($1))/ge;
    $name = $copydir.$name;
    printf "from %s to %s\n",$oldname,$name;
    if(!-l and -d){ mkdir($name) or die $!; }
    else{ link($oldname,$name) or die $!; }
}
chdir($startdir) or die;
find(\&handlefile, '.');

You can then rsync /tmp/b to your ntfs. This is just a demo, and needs work for unicode and other limitations of ntfs like max filename length. You could also check for lowercase/uppercase clashes , and use your preferred encoding (: to COLON and so on). You could do a second pass to fix the timestamps on the directories. Unless you have millions of files, the work needed to create this copy of the directory structure, with hard links to the files, should not be that onerous.

Related Question