Modify file names in tar file without extracting files to temporary location

commandcommand linetar

I have a tar file that contains some files in a folder named old_name. Now I'd like to create a new tar file where that folder has been renamed to new_name without extracting to file to disk as that would be significantly slower for large archives (More than double the disk reads and writes).
I know how to do that: tar -xf old.tar; tar -cf new.tar --transform 's/old_name/new_name/' old_name

I've tried a few things but none seemed to work:

tar -cOf old.tar | tar -xf new.tar --transform 's/old_name/new_name/'
cat old.tar | tar --delete --transform 's/old_name/new_name/' > new.tar
cat old.tar | tar -u --transform 's/old_name/new_name/' > new.tar

But nothing seems to work.

Closed I've found are these:

But those are about removing files in the tarball, not changing their paths.

Best Answer

tar can create or extract tar's but it can't operate on streams in this way.

You'll need something like tar-stream - it even has an example that does what you are asking:

https://github.com/mafintosh/tar-stream#modifying-existing-tarballs

Script

#!/bin/bash

# User configuratoin
#===================
files=(*.log)           # Set the file pattern to be used, e.g. (*.txt) or (*)
num_files_per_tar=5 # Number of files per tar
num_procs=4         # Number of tar processes to start
tar_file_dir='/tmp' # Tar files dir
tar_file_name_prefix='tar' # prefix for tar file names
tar_file_name="$tar_file_dir/$tar_file_name_prefix"

# Main algorithm
#===============
num_tars=$((${#files[@]}/num_files_per_tar))  # the number of tar files to create
tar_files=()  # will hold the names of files for each tar

tar_start=0 # gets update where each tar starts
# Loop over the files adding their names to be tared
for i in `seq 0 $((num_tars-1))`
do
  tar_files[$i]="$tar_file_name$i.tar.bz2 ${files[@]:tar_start:num_files_per_tar}"
  tar_start=$((tar_start+num_files_per_tar))
done

# Start tar in parallel for each of the strings we just constructed
printf '%s\n' "${tar_files[@]}" | xargs -n$((num_files_per_tar+1)) -P$num_procs tar cjvf

Explanation

First, all the file names that match the selected pattern are stored in the array files. Next, the for loop slices this array and forms strings from the slices. The number of the slices is equal to the number of the desired tarballs. The resulting strings are stored in the array tar_files. The for loop also adds the name of the resulting tarball to the beginning of each string. The elements of tar_files take the following form (assuming 5 files/tarball):

tar_files[0]="tar0.tar.bz2  file1 file2 file3 file4 file5"
tar_files[1]="tar1.tar.bz2  file6 file7 file8 file9 file10"
...

The last line of the script, xargs is used to start multiple tar processes (up to the maximum specified number) where each one will process one element of tar_files array in parallel.

Test

List of files:

$ls

a      c      e      g      i      k      m      n      p      r      t
b      d      f      h      j      l      o      q      s

Generated Tarballs: $ls /tmp/tar* tar0.tar.bz2 tar1.tar.bz2 tar2.tar.bz2 tar3.tar.bz2

Best Answer

Related Solutions

How to create a gnu tarball that can be extracted by solaris tar

How to Create Multi Tar Archives for a Huge Folder

Script

Explanation

Test

Related Question