This is a slightly exotic question, but there doesn't seem to be much information on the net about this. I just added an answer to a question about the zip format's external file attribute. As you can see from my answer, I conclude that only the second byte (of 4 bytes) is actually used for Unix. Apparently this contains enough information when unzipping to deduce whether the object is a file or a directory, and also has space for other permission and attribute information. My question is, how does this map to the usual Unix permissions? Do the usual Unix permissions (e.g. below) that ls
gives fit into exactly one byte, and if so, can someone describe the layout or give a reference, please?
$ ls -la
total 36
drwxr-xr-x 3 faheem faheem 4096 Jun 10 01:11 .
drwxrwxrwt 136 root root 28672 Jun 10 01:07 ..
-rw-r--r-- 1 faheem faheem 0 Jun 10 01:07 a
drwxr-xr-x 2 faheem faheem 4096 Jun 10 01:07 b
lrwxrwxrwx 1 faheem faheem 1 Jun 10 01:11 c -> b
Let me make this more concrete by asking a specific question. Per the Trac patch quoted in my answer above, you can create a zip file with the snippet of Python below.
The 040755 << 16L
value corresponds to the creation of an empty directory with the permissions drwxr-xr-x
. (I tested it). I recognize 0755
corresponds to the rwxr-xr-x
pattern, but what about the 04
, and how does the whole value correspond to a byte? I also recognize << 16L
corresponds to a bitwise left shift of 16 places, which would make it end up as the second from top byte.
def makezip1():
import zipfile
z = zipfile.ZipFile("foo.zip", mode = 'w')
zfi = zipfile.ZipInfo("foo/empty/")
zfi.external_attr = 040755 << 16L # permissions drwxr-xr-x
z.writestr(zfi, "")
print z.namelist()
z.close()
EDIT: On rereading this, I think that my conclusion that the Unix permissions only correspond to one byte may be incorrect, but I'll let the above stand for the present, since I'm not sure what the correct answer is.
EDIT2: I was indeed incorrect about the Unix values only corresponding to 1 byte. As @Random832 explained, it uses both of the top two bytes. Per @Random832's answer, we can construct the desired 040755
value from the tables he gives below. Namely:
__S_IFDIR + S_IRUSR + S_IWUSR + S_IXUSR + S_IRGRP + S_IXGRP + S_IROTH + S_IXOTH
0040000 + 0400 + 0200 + 0100 + 0040 + 0010 + 0004 + 0001
= 40755
The addition here is in base 8.
Best Answer
0040000
is the traditional value ofS_IFDIR
, the file type flag representing a directory. The type uses the top 4 bits of the 16-bitst_mode
value,0100000
is the value for regular files.The high 16 bits of the external file attributes seem to be used for OS-specific permissions. The Unix values are the same as on traditional unix implementations. Other OSes use other values. Information about the formats used in a variety of different OSes can be found in the Info-ZIP source code (download or e.g in debian
apt-get source [zip or unzip]
) - relevant files arezipinfo.c
inunzip
, and the platform-specific files inzip
.These are conventionally defined in octal (base 8); this is represented in C and python by prefixing the number with a
0
.These values can all be found in
<sys/stat.h>
- link to 4.4BSD version. These are not in the POSIX standard (which defines test macros instead); but originate from AT&T Unix and BSD. (in GNU libc / Linux, the values themselves are defined as__S_IFDIR
etc inbits/stat.h
, though the kernel header might be easier to read - the values are all the same pretty much everywhere.)And of course, the other 12 bits are for the permissions and setuid/setgid/sticky bits, the same as for chmod:
As a historical note, the reason
0100000
is for regular files instead of 0 is that in very early versions of unix, 0 was for 'small' files (these did not use indirect blocks in the filesystem) and the high bit of the mode flag was set for 'large' files which would use indirect blocks. The other two types using this bit were added in later unix-derived OSes, after the filesystem had changed.So, to wrap up, the overall layout of the extended attributes field for Unix is