Unzip the archive with more than one entry

pipezip

I'm trying to decompress ~8GB .zip file piped from curl command. Everything I have tried is being interrupted at <1GB and returns a message:

… has more than one entry–rest ignored

I've tried: funzip, gunzip, gzip -d, zcat, … also with different arguments – all end up in the above message.

The datafile is public, so it's easy to repro the issue:

curl -L https://archive.org/download/nycTaxiTripData2013/faredata2013.zip | funzip > datafile

Best Answer

The commands you're using can only extract data from the first entry in a ZIP archive; this is mentioned explicitly in the funzip manpage:

funzip without a file argument acts as a filter; that is, it assumes that a ZIP archive (or a gzip'd(1) file) is being piped into standard input, and it extracts the first member from the archive to stdout.

faredata2013.zip contains multiple entries, so you need to use unzip to extract them. If you want to extract them to stdout, you can use unzip with the -c option, and add -q if you just want the raw contents of all the files in the archive. (-c extracts the archive's contents to stdout, by default with a header giving each file's name before its contents; -q extracts without outputting the file names). You can also use the -p option instead of both -c and -q.

Related Question