MySQL – Export two columns from large table without causing a lock

exportinnodblockingMySQL

I am a sysadmin who doesn't do a ton of dba stuff. For a project, I have access to a production server–which I am not the sysadmin for–with an enormous 40,000,000 row >10 GB MySQL InnoDB table. I want to export two small columns, one of which is an INT(11), and the other of which is a VARCHAR(20) from that table to a CSV or .SQL file (either is fine, I'll write a crosswalk for the next step no problem).

We can call the columns ColumnA and ColumnB and the table SampleTable.

MySQLdump is not the right tool for this because I can't specify the columns and I don't need to export a massive massive table just got two tiny columns.

I know I can do a SELECT INTO statement (either to create a new table with just the columns or to do a SELECT INTO OUTFILE to skip the intermediate step), but I am concerned that this will cause a table level lock on a production server. The table is InnoDB.

What's my best bet to avoid inconveniencing any live traffic on the server or locking anything?

Thanks

Best Answer

Many experts share many ways on how to overcome this problem. These are my suggestions to play a safe game.

Try to set the below command in a seperate session.

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
use db;
SELECT COLA, COLB into outfile '/tmp/data.csv' from TABLE_NAME;
COMMIT;
exit;

Doing by this way the SELECT statements are performed in a nonlocking fashion, but a possible earlier version of a row might be used. Thus, using this isolation level, such reads are not consistent.When you say not-consistent it means recently changing records i.e.. DML transactions that are currently in process will not be read. I assume which is in your case it is acceptable. This is also called a “dirty read.” Otherwise, this isolation level works like READ COMMITTED.

If I were to be you, the below order is what I follow.

  • Check for non-peak hours for DB server also for the table that you mind to do select. By this the outfile reads/write, IOs shouldn't cost much to the server.

  • Manually run this in the server back ground. And keept monitoring the server health.

$ cat bg_loaddata.sh

mysql -uroot -p*** << EOF
use db;
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
SELECT COLA, COLB into outfile '/tmp/data.csv' from TABLE_NAME;
COMMIT;
exit;
EOF


$ nohup ./bg_loaddata.sh > bg_loaddata.log &
  • If you see any inserts or updates that are taking longer time to complete in show processlist; You may wait for sometime, but if it is causing severe turbulence then goahead and kill the connection id in processlist for your select ...into outfile ... [I'm 99% sure this will not happen, but always to be prepared for any back fires]

By this way you can be sure for N number of records to be retrieved it taks N number of seconds, and no issues can be faced by doing this way. And whenever you need this to be run next time put the script in cron and have a sound sleep. :) [Ofcourse you have monitoring metrics to wake you up]-> Just to be aware before someone surprises you with the NEWS.

All the best !!