What characters are safe in cross-platform file names for Linux, Windows and OS-X

filenames

Currently, I use a YYMMDD-NAME+PAGE name for most of my files. NAME has spaces converted to underscores.

I'd like to use the YYYY-MM-DD date format, but I am not sure how to separate it from the name. A - would look strange if the name started with a number. If I use a _, then it conflicts with the underscore representing a space.

What characters are reasonably safe in file names that would work here? I am on Linux, but I might share files with other people (Windows 7, Mac OS X).

Best Answer

Summary:

  • Windows: anything except ASCII's control characters and \/:*?"<>|
  • Linux, OS-X: anything except null or /

On all platforms it is best to avoid non-printable characters such as the ASCII control-characters.

Windows

In Windows, Windows Explorer does not allow control-characters or \/:*?"<>| You can use spaces. If you use spaces, you will often have to quote the filename when used from the command line (but GUI apps are unaffected so far as I know). Windows filesystem such as NTFS apparently store the encoding with the filename, but UTF-16 is standard.

Some parts of Windows are case-sensitive, other parts are case-insensitive. It is easy to create distinct filenames like "Ab" and "ab" on a Windows NTFS filesystem. These names refer to separate files which contain distinct separate content. However, although the Windows command-prompt will happily list both files using dir, you cannot easily access or manipulate one of them using commands such as type. See below.

Linux, OS-X

In Linux and OS-X only / of the printable ASCII set is prohibited I believe. Some characters (shell metacharacters like *?!) will cause problems in command lines and will require the filename to be appropriately quoted or escaped.

Linux filesystems such as ext2, ext3 are character-set agnostic (I think they just treat it more or less as a byte stream - only nulls and / are prohibited). This means you can store filenames in UTF-8 encoding. I believe it is up to the shell or other application to know what encoding to use to properly convert the filename for display or processing.

Conclusion

So you could probably safely use something like (if it weren't so hard to type)


Case-(in)sensitivity in Windows

C> dir /B
Ab
aB
аB

C> type Ab
b
b

C> type aB
b
b

C> type аB
unicode homograph

Note that we cannot type the contents of the second file, the Windows type command just returns the contents of Ab instead. The third file would be distinct from aB on Linux also.

(Windows 10 NTFS).