Filesystems – Linux Support for File Names Longer Than 255 Bytes

filesystems

I asked about Linux's 255-byte file name limitation yesterday, and the answer was that it is a limitation that cannot/will not be easily changed. But I remembered that most Linux supports NTFS, whose maximum file name length is 255 UTF-16 characters.

So, I created an NTFS partition, and try to name a file to a 160-character Japanese string, whose bytes in UTF-8 is 480. I expected that it would not work but it worked, as below. How come does it work, when the file name was 480 bytes? Is the 255-byte limitation only for certain file systems and Linux itself can handle file names longer than 255 bytes?

enter image description here

—-PS—–

The string is the beginning part of a famous old Japanese essay titled "方丈記". Here is the string.

ゆく河の流れは絶えずして、しかももとの水にあらず。よどみに浮かぶうたかたは、かつ消えかつ結びて、久しくとどまりたるためしなし。世の中にある人とすみかと、またかくのごとし。たましきの都のうちに、棟を並べ、甍を争へる、高き、卑しき、人の住まひは、世々を経て尽きせぬものなれど、これをまことかと尋ぬれば、昔ありし家はまれなり。

I had used this web application to count the UTF-8 bytes.

enter image description here

Best Answer

The answer, as often, is “it depends”.

Looking at the NTFS implementation in particular, it reports a maximum file name length of 255 to statvfs callers, so callers which interpret that as a 255-byte limit might pre-emptively avoid file names which would be valid on NTFS. However, most programs don’t check this (or even NAME_MAX) ahead of time, and rely on ENAMETOOLONG errors to catch errors. In most cases, the important limit is PATH_MAX, not NAME_MAX; that’s what’s typically used to allocate buffers when manipulating file names (for programs that don’t allocate path buffers dynamically, as expected by OSes like the Hurd which doesn't have arbitrary limits).

The NTFS implementation itself doesn’t check file name lengths in bytes, but always as 2-byte characters; file names which can’t be represented in an array of 255 2-byte elements will cause a ENAMETOOLONG error.

Note that NTFS is generally handled by a FUSE driver on Linux. The kernel driver currently only supports UCS-2 characters, but the FUSE driver supports UTF-16 surrogate pairs (with the corresponding reduction in character length).