Filesystems – Linux Support for File Names Longer Than 255 Bytes

filesystems

I asked about Linux's 255-byte file name limitation yesterday, and the answer was that it is a limitation that cannot/will not be easily changed. But I remembered that most Linux supports NTFS, whose maximum file name length is 255 UTF-16 characters.

So, I created an NTFS partition, and try to name a file to a 160-character Japanese string, whose bytes in UTF-8 is 480. I expected that it would not work but it worked, as below. How come does it work, when the file name was 480 bytes? Is the 255-byte limitation only for certain file systems and Linux itself can handle file names longer than 255 bytes?

—-PS—–

The string is the beginning part of a famous old Japanese essay titled "方丈記". Here is the string.

ゆく河の流れは絶えずして、しかももとの水にあらず。よどみに浮かぶうたかたは、かつ消えかつ結びて、久しくとどまりたるためしなし。世の中にある人とすみかと、またかくのごとし。たましきの都のうちに、棟を並べ、甍を争へる、高き、卑しき、人の住まひは、世々を経て尽きせぬものなれど、これをまことかと尋ぬれば、昔ありし家はまれなり。

I had used this web application to count the UTF-8 bytes.

Best Answer

The answer, as often, is “it depends”.

Looking at the NTFS implementation in particular, it reports a maximum file name length of 255 to statvfs callers, so callers which interpret that as a 255-byte limit might pre-emptively avoid file names which would be valid on NTFS. However, most programs don’t check this (or even NAME_MAX) ahead of time, and rely on ENAMETOOLONG errors to catch errors. In most cases, the important limit is PATH_MAX, not NAME_MAX; that’s what’s typically used to allocate buffers when manipulating file names (for programs that don’t allocate path buffers dynamically, as expected by OSes like the Hurd which doesn't have arbitrary limits).

The NTFS implementation itself doesn’t check file name lengths in bytes, but always as 2-byte characters; file names which can’t be represented in an array of 255 2-byte elements will cause a ENAMETOOLONG error.

Note that NTFS is generally handled by a FUSE driver on Linux. The kernel driver currently only supports UCS-2 characters, but the FUSE driver supports UTF-16 surrogate pairs (with the corresponding reduction in character length).

Related Solutions

Why does “ls *” take so much longer than “ls”

When you run ls without arguments, it will just open a directory, read all the contents, sort them and print them out.

When you run ls *, first the shell expands *, which is effectively the same as what the simple ls did, builds an argument vector with all the files in the current directory and calls ls. ls then has to process that argument vector and for each argument, and calls access(2)¹ the file to check it's existence. Then it will print out the same output as the first (simple) ls. Both the shell's processing of the large argument vector and ls's will likely involve a lot of memory allocation of small blocks, which can take some time. However, since there was little sys and user time, but a lot of real time, most of the time would have been spent waiting for disk, rather than using CPU doing memory allocation.

Each call to access(2) will need to read the file's inode to get the permission information. That means a lot more disk reads and seeks than simply reading a directory. I do not know how expensive these operations are on your GPFS, but as the comparison you've shown to ls -l which has a similar run time to the wildcard case, the time needed to retrieve the inode information appears to dominate. If GPFS has a slightly higher latency than your local filesystem on each read operation, we would expect it to be more pronounced in these cases.

The difference between the wildcard case and ls -l of 50% could be explained by the ordering of inodes on the disk. If the inodes were laid out successively in the same order as the filenames in the directory and ls -l stat(2)ed the files in directory order before sorting, ls -l would possibly read most of the inodes in a sweep. With the wildcard, the shell will sort the filenames before passing them to ls, so ls will likely read the inodes in a different order, adding more disk head movement.

It should be noted that your time output will not include the time taken by the shell to expand the wildcard.

If you really want to see what's going on, use strace(1):

strace -o /tmp/ls-star.trace ls *
strace -o /tmp/ls-l-star.trace ls -l *

and have a look which system calls are being performed in each case.

¹ I don't know if access(2) is actually used, or something else such as stat(2). But both probably require an inode lookup (I'm not sure if access(file, 0) would bypass an inode lookup.)

Linux Filesystems – Questions About Character Encoding

I have reformulated your questions a bit, for reasons that should appear evident when you read them in sequence.

1. Is it possible to config linux filesystem use fixed character encoding to store file names regardless of LANG/LC_ALL environment?

No, this is not possible: as you mention in your question, a UNIX file name is just a sequence of bytes; the kernel knows nothing about the encoding, which entirely a user-space (i.e., application-level) concept.

In other words, the kernel knows nothing about LANG/LC_*, so it cannot translate.

2. Is it possible to let different file names refer to same file?

You can have multiple directory entries referring to the same file; you can make that through hard links or symbolic links.

Be aware, however, that the file names that are not valid in the current encoding (e.g., your GBK character string when you're working in a UTF-8 locale) will display badly, if at all.

3. Is it possible to patch the kernel to translate character encoding between file-system and current environment?

You cannot patch the kernel to do this (see 1.), but you could -in theory- patch the C library (e.g., glibc) to perform this translation, and always convert file names to UTF-8 when it calls the kernel, and convert them back to the current encoding when it reads a file name from the kernel.

A simpler approach could be to write an overlay filesystem with FUSE, that just redirects any filesystem request to another location after converting the file name to/from UTF-8. Ideally you could mount this filesystem in ~/trans, and when an access is made to ~/trans/a/GBK/encoded/path then the FUSE filesystem really accesses /a/UTF-8/encoded/path.

However, the problem with these approaches is: what do you do with files that already exist on your filesystem and are not UTF-8 encoded? You cannot just simply pass them untranslated, because then you don't know how to convert them; you cannot mangle them by translating invalid character sequences to ? because that could create conflicts...