Mysql – thesql – How to repair & backup archive table, also stop growing .ARN extension file

archivebackupMySQL

I have an MySQL table build with archive storage engine. The table has over 100k records.

I have got disk full and it turned out to be a large .ARN file from an archive table. When I cleared some space, the file continued to grow until the space was consumed.

Right now the .ARN file is over 16GB and it's growing on every insert. The corresponding .ARZ file is just over 8MB.

I cannot fetch records from archive table, as the table data is corrupted.
When I tried to repair table it says "Incorrect key file for table" which means File system out of space.

But I am left with no disk space, so I have to repair and backup this table on my local machine.

How can I repair archive table, backup data from archive table and free some disk space? Why .ARN file keeps growing on every insert?

Best Answer

This is a very tricky question because of the internals of the ARCHIVE storage engine.

People have asked this same question in the MySQL Forums

What needs to be understood is the file layout of an ARCHIVE table:

  • .frm : Every table in MySQL has a .frm regardless of storage engine
  • .ARZ : Table data
  • .ARM : Table metadata
  • .ARN : Optimization File

Let's start with .ARZ. What's the Z stand for ??? zlib

Why zlib ??? MySQL Documentation says

Storage: Rows are compressed as they are inserted. The ARCHIVE engine uses zlib lossless data compression (see http://www.zlib.net/). You can use OPTIMIZE TABLE to analyze the table and pack it into a smaller format (for a reason to use OPTIMIZE TABLE, see later in this section). The engine also supports CHECK TABLE. There are several types of insertions that are used:

An INSERT statement just pushes rows into a compression buffer, and that buffer flushes as necessary. The insertion into the buffer is protected by a lock. A SELECT forces a flush to occur.

A bulk insert is visible only after it completes, unless other inserts occur at the same time, in which case it can be seen partially. A SELECT never causes a flush of a bulk insert unless a normal insert occurs while it is loading.

Some compression is happening to data that is being inserted. If your .ARN file is growing, it must be doing some compression related work on every INSERT in terms of encoding/encryption.

Note the zlib.net Technical Details says under the subheading Maximum Compression Factor:

Empirically, the deflate method is capable of compression factors exceeding 1000:1. (The test case was a 50MB file filled with zeros; it compressed to roughly 49 KB.) Mark loves to calculate stuff like this and reports that the theoretical limit for the zlib format (as opposed to its implementation in the currently available sources) is 1032:1. To quote him,

The limit comes from the fact that one length/distance pair can represent at most 258 output bytes. A length requires at least one bit and a distance requires at least one bit, so two bits in can give 258 bytes out, or eight bits in give 1032 bytes out. A dynamic block has no length restriction, so you could get arbitrarily close to the limit of 1032:1.

He goes on to note that the current implementation limits its dynamic blocks to about 8 KB (corresponding to 8MB of input data); together with a few bits of overhead, this implies an actual compression limit of about 1030.3:1. Not only that, but the compressed data stream is itself likely to be rather compressible (in this special case only), so running it through deflate again should produce further gains.

By way of comparison, note that a version of run-length encoding optimized for this sort of unusual data file -- that is, by using 32-bit integers for the lengths rather than the more usual 8-bit bytes or 16-bit words -- could encode the test file in five bytes. That would be a compression factor of 10,000,000:1 (or 10.000.000:1 for you Europeans, or 107:1 for all of you engineers and scientists whose browsers support superscripts).

Finally, please note that this level of compression is extremely rare and only occurs with really trivial files (e.g., a megabyte of zeros). More typical zlib compression ratios are on the order of 2:1 to 5:1.

Give the presence of .ARN that is growing, it must be a temp table used to determine how row data is compressed and then cached to disk. (ARCHIVE does not cache data in memory).

At this point, your problem is at the compression layer of the storage engine.

WHAT TO DO NEXT

Run FLUSH TABLES; and copy the .frm, .ARM, and .ARZ to another server that has the same OS server. Do not copy from Linux to Windows. Try running REPAIR TABLE there.

If you still cannot read the data, you may have to do some deeper diving. You could try downloading archive_reader.c (from Twitter), compile it and try to read your data.

Godspeed, Spiderman !!! (My Disclaimer).

ONCE YOU HAVE RECOVERED YOUR DATA

SUGGESTION #1

Do not use the REPLACE command against an ARCHIVE table. Why ???

If ARCHIVE does not expose the DELETE operation, but allows REPLACE to secretly do so, this could impact performance greatly.

SUGGESTION #2

SELECTs and INSERTs can peacefully coexist within an ARCHIVE table. Of course, the one and only exception would be if you insert a new row and SELECT that same row concurrently.

SUGGESTION #3

Get a bigger data disk and a lot more RAM. This should give zlib a lot more head room for data compression in RAM before resorting to using the .ARN.