Mysql – thesql – How to repair & backup archive table, also stop growing .ARN extension file

archivebackupMySQL

I have an MySQL table build with archive storage engine. The table has over 100k records.

I have got disk full and it turned out to be a large .ARN file from an archive table. When I cleared some space, the file continued to grow until the space was consumed.

Right now the .ARN file is over 16GB and it's growing on every insert. The corresponding .ARZ file is just over 8MB.

I cannot fetch records from archive table, as the table data is corrupted.
When I tried to repair table it says "Incorrect key file for table" which means File system out of space.

But I am left with no disk space, so I have to repair and backup this table on my local machine.

How can I repair archive table, backup data from archive table and free some disk space? Why .ARN file keeps growing on every insert?

Best Answer

This is a very tricky question because of the internals of the ARCHIVE storage engine.

People have asked this same question in the MySQL Forums

Jan 04, 2014 : Huge and growing archive with ARN extension
Jan 28, 2015 : large .arn file filling up whole hard drive

What needs to be understood is the file layout of an ARCHIVE table:

.frm : Every table in MySQL has a .frm regardless of storage engine
.ARZ : Table data
.ARM : Table metadata
.ARN : Optimization File

Let's start with .ARZ. What's the Z stand for ??? zlib

Why zlib ??? MySQL Documentation says

Storage: Rows are compressed as they are inserted. The ARCHIVE engine uses zlib lossless data compression (see http://www.zlib.net/). You can use OPTIMIZE TABLE to analyze the table and pack it into a smaller format (for a reason to use OPTIMIZE TABLE, see later in this section). The engine also supports CHECK TABLE. There are several types of insertions that are used:

An INSERT statement just pushes rows into a compression buffer, and that buffer flushes as necessary. The insertion into the buffer is protected by a lock. A SELECT forces a flush to occur.

A bulk insert is visible only after it completes, unless other inserts occur at the same time, in which case it can be seen partially. A SELECT never causes a flush of a bulk insert unless a normal insert occurs while it is loading.

Some compression is happening to data that is being inserted. If your .ARN file is growing, it must be doing some compression related work on every INSERT in terms of encoding/encryption.

Note the zlib.net Technical Details says under the subheading Maximum Compression Factor:

Empirically, the deflate method is capable of compression factors exceeding 1000:1. (The test case was a 50MB file filled with zeros; it compressed to roughly 49 KB.) Mark loves to calculate stuff like this and reports that the theoretical limit for the zlib format (as opposed to its implementation in the currently available sources) is 1032:1. To quote him,

The limit comes from the fact that one length/distance pair can represent at most 258 output bytes. A length requires at least one bit and a distance requires at least one bit, so two bits in can give 258 bytes out, or eight bits in give 1032 bytes out. A dynamic block has no length restriction, so you could get arbitrarily close to the limit of 1032:1.

He goes on to note that the current implementation limits its dynamic blocks to about 8 KB (corresponding to 8MB of input data); together with a few bits of overhead, this implies an actual compression limit of about 1030.3:1. Not only that, but the compressed data stream is itself likely to be rather compressible (in this special case only), so running it through deflate again should produce further gains.

By way of comparison, note that a version of run-length encoding optimized for this sort of unusual data file -- that is, by using 32-bit integers for the lengths rather than the more usual 8-bit bytes or 16-bit words -- could encode the test file in five bytes. That would be a compression factor of 10,000,000:1 (or 10.000.000:1 for you Europeans, or 107:1 for all of you engineers and scientists whose browsers support superscripts).

Finally, please note that this level of compression is extremely rare and only occurs with really trivial files (e.g., a megabyte of zeros). More typical zlib compression ratios are on the order of 2:1 to 5:1.

Give the presence of .ARN that is growing, it must be a temp table used to determine how row data is compressed and then cached to disk. (ARCHIVE does not cache data in memory).

At this point, your problem is at the compression layer of the storage engine.

WHAT TO DO NEXT

Run FLUSH TABLES; and copy the .frm, .ARM, and .ARZ to another server that has the same OS server. Do not copy from Linux to Windows. Try running REPAIR TABLE there.

If you still cannot read the data, you may have to do some deeper diving. You could try downloading archive_reader.c (from Twitter), compile it and try to read your data.

Godspeed, Spiderman !!! (My Disclaimer).

ONCE YOU HAVE RECOVERED YOUR DATA

SUGGESTION #1

Do not use the REPLACE command against an ARCHIVE table. Why ???

DELETE is not supported for an ARCHIVE table
REPLACE is supported for an ARCHIVE table, requires INSERT and DELETE privs
REPLACE is a mechanical DELETE and INSERT

If ARCHIVE does not expose the DELETE operation, but allows REPLACE to secretly do so, this could impact performance greatly.

SUGGESTION #2

SELECTs and INSERTs can peacefully coexist within an ARCHIVE table. Of course, the one and only exception would be if you insert a new row and SELECT that same row concurrently.

SUGGESTION #3

Get a bigger data disk and a lot more RAM. This should give zlib a lot more head room for data compression in RAM before resorting to using the .ARN.

SUGGESTIONS

Change the insert to a format that can handle the AUTO_INCREMENT attribute of id

insert into test_table (id,arch) values (0,'FILE_CONTENT')

insert into test_table (arch) values ('FILE_CONTENT')

Give it a Try !!!

UPDATE 2013-08-06 16:57 EST

If you are planning to do queries from the archive table, you need to get away from the ARCHIVE Storage Engine. Why? Again, according to the MySQL Documentation

Retrieval: On retrieval, rows are uncompressed on demand; there is no row cache. A SELECT operation performs a complete table scan: When a SELECT occurs, it finds out how many rows are currently available and reads that number of rows. SELECT is performed as a consistent read. Note that lots of SELECT statements during insertion can deteriorate the compression, unless only bulk or delayed inserts are used

Note that every SELECT against an ARCHIVE table is a full table scan. If you lookup id 123 with 1,000,000 rows, you gotta read 1,000,000 rows every time.

SUGGESTION

Convert the table to MyISAM. Then you can have a proper index on id plus the ability to create other indexes on other columns as needed.

CREATE TABLE test_table_myisam ENGINE=MyISAM as SELECT * FROM test_table WHERE 1=2;
ALTER TABLE test_table_myisam ADD PRIMARY KEY (id);
INSERT IGNORE INTO test_table_myisam SELECT * FROM test_table;
DROP TABLE test_table;
ALTER TABLE test_table_myisam RENAME test_table;

MySQL creates temporary tables on disk. How to stop it

Looking at the my.ini, I have two suggestions

SUGGESTION #1

I would bump up the following settings in your my.ini

sort_buffer_size=4M
join_buffer_size=4M

This will make some joins and sort stay in memory. Of course, once a JOIN or an ORDER BY needs more than 4M, it will page to disk as a MyISAM table.

If you cannot login as root@localhost, then restart mysql with

C:\> net stop mysql
C:\> net start mysql

If you can login as root@localhost, you do not have to restart mysql to use these settings.

Just run this in the MySQL client:

SET @FourMegs = 1024 * 1024 * 4;
SET GLOBAL sort_buffer_size = @FourMegs;
SET GLOBAL join_buffer_size = @FourMegs;

SUGGESTION #2

Since your Data is on Drive D:, you may have Disk I/O on Drive C:.

Please run this query:

mysql> show variables like 'tmpdir';
+---------------+-----------------+
| Variable_name | Value           |
+---------------+-----------------+
| tmpdir        | C:\Windows\TEMP |
+---------------+-----------------+
1 row in set (0.00 sec)

Since I run mysql on my Desktop with defaults, my temp tables are being written to Drive C:. If Drive D is a better disk than Drive C:, perhaps you can map temp tables to Drive D: by setting tmpdir in my.ini as follows:

tmpdir="D:/DBs/"

You will have to restart mysql since tmpdir is not a dynamic variable.

Give it a Try !!!

UPDATE 2013-11-29 10:09 EST

SUGGESTION #3

Given the fact that MySQL is running in Windows and you cannot touch the queries in the core package, I have two ideas tat must be done together.

IDEA #1 : Move the Database to a Linux Machine

You should be able to

Setup a Linux machine
Install MySQL on the Linux machine
Enable Binary Logging for MySQL in Windows
mysqldump the database to a text SQL file
Load SQL file to MySQL running in Linux
Setup replication from MySQL/Windows to MySQL/Linux

IDEA #2 : Reconfigure Moodle to point to the Linux Machine

Moodle was designed for LAMP in the first place. Just change the config files to point to the Linux machine instead of localhost.

Here is a link to an old Moodle 2.3 doc on setting up MySQL : http://docs.moodle.org/23/en/Installing_Moodle#Create_an_empty_database

I am sure the latest docs are available as well.

What is the Point of Moving the Database to Linux ???

How does this help the temp table situation ???

I would then suggestion setting up a RAM disk as the target folder for your temp tables

Jan 04, 2013 : Is there a MySQL engine or trick to avoid writing so many temp tables to disk?
Dec 17, 2012 : Why does MySQL produce so many temporary MYD files? (Actual instructions)
Nov 30, 2012 : Is it bad to create many mysql temporary tables simultaneously?

Temp table creation will still happen, but it will be written to RAM rather than disk. reducing Disk I/O.

UPDATE 2013-11-29 11:24 EST

SUGGESTION #4

I would suggest revisiting SUGGESTION #2 with a fast RAID-0 disk (32+ GB), configuring it as Drive T: (T for Temp). After installing such a disk, add this to my.ini:

[mysqld]
tmpdir="T:\"

MySQL restart would be required, using

net stop mysql
net start mysql

BTW I said RAID-0 on purpose so that you can get good write performance over a RAID-1, RAID-10. A tmp table disk is not something I would make redundant.

Without optimizing the queries as @RaymondNijland has been commenting on, you cannot reduce the temp table creation count in any way. SUGGESTION #3 and SUGGESTION #4 offer speeding up temp table creation and temp table I/O as the only alternative.

Best Answer

WHAT TO DO NEXT

ONCE YOU HAVE RECOVERED YOUR DATA

SUGGESTION #1

SUGGESTION #2

SUGGESTION #3

Related Solutions

Mysql – Inserting into thesql table with archive engine “duplicate key” error

SUGGESTIONS

Give it a Try !!!

UPDATE 2013-08-06 16:57 EST

SUGGESTION

MySQL creates temporary tables on disk. How to stop it

SUGGESTION #1

SUGGESTION #2

Give it a Try !!!

UPDATE 2013-11-29 10:09 EST

SUGGESTION #3

IDEA #1 : Move the Database to a Linux Machine

IDEA #2 : Reconfigure Moodle to point to the Linux Machine

What is the Point of Moving the Database to Linux ???

How does this help the temp table situation ???

UPDATE 2013-11-29 11:24 EST

SUGGESTION #4

Related Question