Let's say I create 100 files with random text data of size 30MB each. Now I create a zip archive with 0 compression i.e. zip dataset.zip -r -0 *.txt
. Now I want to extract just one file from this archive.
As described here, there are two ways of unzipping/extracting files from archives:
- Seek to the end of the file and lookup the central directory. Then use that for fast random access to the file to be extracted.(Amortized
O(1)
complexity) - Look through each local header and extract the one where theres a match.(
O(n)
complexity)
Which method does unzip use? From my experiments it seems like it uses method 2?
Best Answer
When searching for a single file in a large archive, it uses method 1, which you can see using
strace
:unzip
opensdataset.zip
, seeks to the end, then seeks to the start of the requested file in the archive (rand-28.txt
, at offset 849346560) and reads from there.The central directory is found by scanning the last 65557 bytes of the archive; see the code starting here: