How to create a zip file v2.0

zip

How can I create a zip file v2.0?

It seems OpenDocument files are zip files v2.0:

$ file foo.odt
foo.odt: OpenDocument Text
$ hexdump -C -n 16 foo.odt
00000000  50 4b 03 04 14 00 00 08  00 00 03 0d 47 42 5e c6  |PK..........GB^.|
00000010

The fifth byte is 0x14.

But if I unzip foo.odt and zip it back into bar.odt, I get a v1.0 zip file:

$ unzip -d foo foo.odt
$ cd foo/
$ zip -0 -X ../bar.odt mimetype
$ zip -r ../bar.odt * -x mimetype
$ file ../bar.odt
bar.odt: Zip archive data, at least v1.0 to extract
$ hexdump -C -n 16 ../bar.odt
00000000  50 4b 03 04 0a 00 00 00  00 00 00 90 46 42 5e c6  |PK..........FB^.|
00000010

The fifth byte is 0x0a.

zip (2.32), Debian (6.0)

Best Answer

Edit: OK. Notice question has been updated so this ~~You do not get a v0.1 but v1.0.~~ does not longer apply.

The version is not "how capable" the file is but what minimum version is required to extract that file from within the archive.

This is not the overall version for the archive!

One difference here is that e.g. OO tags all files with same version requirement. That in turn is the file in the document (archive over all) with highest requirements.

That is. Each file has a zip-header that specify minimum version required to extract it. By the above we have typically:

  archive-files    PackType  Zip-Required OO-Header `zip`-header
+--------------------------------------------------------------+
| mimetype         Store     1.0          2.0        1.0        |__ foo.odt
| content.xml      Deflate   2.0          2.0        2.0        |
+---------------------------------------------------------------+

So OO set required flag to 2.0 even though it is 1.0. This does not, however, affect the ability to open the document. (It's OK to open a manually zip'ed file in OO even though mimetype is tagged with v1.0).

Versions

foo.odt:

1400   Version needed to extract.
0008   General Purpose
0000   Compression method

Version needed to extract, here the lower byte, 0x14, is translated by dividing and modulus by 10:

Major: 0x14 / 0x0a = 2
Minor: 0x14 % 0x0a = 0

Aka Version 2.0

The higher byte 0x00 indicates what the file is compatible with. If zero, then it is compatible with MS-DOS (FAT, FAT32, VFAT). Else it is specified by a mapping. E.g. if I use zip with no options on my system I get a 0x03 which indicates Unix. 0x0a is NTFS etc.

Version 2.0 indicates: (4.4.3.2 Current minimum feature versions)

* File is a folder (directory)
* File is compressed using Deflate compression
* File is encrypted using traditional PKWARE encryption

In you zip'ed file you have:

bar.odt:

0a00   Version needed to extract.
0000   General Purpose
0000   Compression method


Major: 0x0a / 0x0a = 1
Minor: 0x0a % 0x0a = 0

Aka version 1.0

Version 1.0 is simply default value.

File Hierarchy and minimum version

The reason you see version 1.0 under Version needed to extract - is that what you actually see is the zip-header for the file mimetype. This file is not deflated but stored with no compression. Thus you only need version 1.0 to extract that file. This, however, is not the overall version of the archive. If you look further down you'll find version 2.0 as soon as you find a file saved with deflating. You can check by e.g.:

hexdump -v -e '/1 "%02x "' bar.odt | grep -o '50 4b 03 04 .\{6\}'

Should give you something like

50 4b 03 04 0a 00 
50 4b 03 04 0a 00 
...
50 4b 03 04 14 00 
50 4b 03 04 14 00 
50 4b 03 04 0a 00 
50 4b 03 04 14 00 
...

Central directory file header

There are some file with an extended header. You can list these by:

hexdump -v -e '/1 "%02x "' foo.odt | grep -o '50 4b 01 02.\{16\}'

(Remember to reverse 50 4b ... to 02 01 4b 50 if hexdump -n 4 foo.odt say so)

By this you'll get typically:

                  ____________ Version required (2.0)
                  |   |
50 4b 01 02 14 00 14 00 00 
50 4b 01 02 14 00 14 00 00 
50 4b 01 02 14 00 14 00 08
            |___| 
              |      
              +-------------- Version supported by packing application. v2.0

On the zip created file you could get get e.g.:

                  ____________ Version required for this file (2.0)
                  |   |
50 4b 01 02 1e 03 14 00 00
            |___| 
              |      
              +-------------- Version supported by packing 
                              application. v3.0

General purpose (and other flag set in odt files)

This is a bit flag. As your file is big-endian / Motorola, the flag becomes:

0x0800 = 0000 1000 0000 0000
              |
              +---------------- 11 => File names and comments MUST be 
                                      stored as Utf-8.

At least LibreOffice saves further with various mods.

mimetype is always first and should not be compressed. This is to help various software to identify the file and its content. By this one can e.g.:

$ hexdump -C -s 38 -n 39 foo.odt

00000026  61 70 70 6c 69 63 61 74  69 6f 6e 2f 76 6e 64 2e  |application/vnd.|
00000036  6f 61 73 69 73 2e 6f 70  65 6e 64 6f 63 75 6d 65  |oasis.opendocume|
00000046  6e 74 2e 74 65 78 74                              |nt.text|

While zip typically saves all directories, OO saves only a directory if it is empty. Thus:

zip:

Thumbnails/
Thumbnails/thumbnail.png
META-INF/
META-INF/manifest.xml

oo:

Thumbnails/thumbnail.png
META-INF/manifest.xml

And so on ...

Related Solutions

Zip – Create Directory If Zip Archive Contains Several Files

I would do something like this (zsh syntax):

unz() (
  tmp=$(TMPDIR=. mktemp -d -- ${${argv[-1]:t:r}%.tar}.XXXXXX) || exit
  print -r >&2 "Extracting in $tmp"
  cd -- $tmp || exit
  [[ $argv[-1] = /* ]] || argv[-1]=../$argv[-1]
  (set -x; "$@"); ret=$?
  files=(*(ND[1,2]))
  case $#files in
    (0) print -r >&2 "No file created"
        rmdir -v "../$tmp";;
    (1) mv -v -- $files .. && rmdir -v ../$tmp;;
    (*) mv -vT ../$tmp ../$tmp:r;;
  esac && exit $ret
)

That is:

create a directory in anycase
run the command
depending on how many files the command generated:
- remove that directory (if it didn't create any file)
- if it created only one file/dir, move it one level up and discard our directory
- otherwise, attempt to strip the random string from the end of our temp directory.

This way, you can do:

unz unzip foo.zip
unz tar xf foo.tar.gz

It assumes that the last argument to the extracting command is the file to extract. It also assumes GNU tools for the -v options. On non-GNU systems, you can remove those and possibly do the logging by hand. mv -T is also GNU specific, and is to force mv to attempt do a rename only.

What method does unzip use to find a single file in an archive

When searching for a single file in a large archive, it uses method 1, which you can see using strace:

open("dataset.zip", O_RDONLY)           = 3
ioctl(1, TIOCGWINSZ, 0x7fff9a895920)    = -1 ENOTTY (Inappropriate ioctl for device)
write(1, "Archive:  dataset.zip\n", 22Archive:  dataset.zip
) = 22
lseek(3, 943718400, SEEK_SET)           = 943718400
read(3, "\340P\356(s\342\306\205\201\27\360U[\250/2\207\346<\252+u\234\225\1[<\2310E\342\274"..., 4522) = 4522
lseek(3, 943722880, SEEK_SET)           = 943722880
read(3, "\3\f\225P\\ux\v\0\1\4\350\3\0\0\4\350\3\0\0", 20) = 20
lseek(3, 943718400, SEEK_SET)           = 943718400
read(3, "\340P\356(s\342\306\205\201\27\360U[\250/2\207\346<\252+u\234\225\1[<\2310E\342\274"..., 8192) = 4522
lseek(3, 849346560, SEEK_SET)           = 849346560
read(3, "D\262nv\210\343\240C\24\227\344\367q\300\223\231\306\330\275\266\213\276M\7I'&35\2\234J"..., 8192) = 8192
stat("rand-28.txt", 0x559f43e0a550)     = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550)    = -1 ENOENT (No such file or directory)
stat("rand-28.txt", 0x559f43e0a550)     = -1 ENOENT (No such file or directory)
lstat("rand-28.txt", 0x559f43e0a550)    = -1 ENOENT (No such file or directory)
open("rand-28.txt", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4
ioctl(1, TIOCGWINSZ, 0x7fff9a895790)    = -1 ENOTTY (Inappropriate ioctl for device)
write(1, " extracting: rand-28.txt        "..., 37 extracting: rand-28.txt             ) = 37
read(3, "\275\3279Y\206\223\217}\355W%:\220YNT\0\257\260z^\361T\242\2\370\21\336\372+\306\310"..., 8192) = 8192

unzip opens dataset.zip, seeks to the end, then seeks to the start of the requested file in the archive (rand-28.txt, at offset 849346560) and reads from there.

The central directory is found by scanning the last 65557 bytes of the archive; see the code starting here:

/*---------------------------------------------------------------------------
    Find and process the end-of-central-directory header.  UnZip need only
    check last 65557 bytes of zipfile:  comment may be up to 65535, end-of-
    central-directory record is 18 bytes, and signature itself is 4 bytes;
    add some to allow for appended garbage.  Since ZipInfo is often used as
    a debugging tool, search the whole zipfile if zipinfo_mode is true.
  ---------------------------------------------------------------------------*/