“Cannot allocate memory” when reading from SCSI tape

ddscsitapetar

I am experimenting with some old SCSI tape drives, and I have successfully written some data to a tape, but I am struggling trying to read it back again.

# tar tvf /dev/st0
tar: /dev/st0: Cannot read: Cannot allocate memory
tar: At beginning of tape, quitting now
tar: Error is not recoverable: exiting now

# dd if=/dev/st0 of=test
dd: error reading '/dev/st0': Cannot allocate memory
0+0 records in
0+0 records out
0 bytes copied, 3.20155 s, 0.0 kB/s

After these commands, dmesg says:

st 10:0:3:0: [st0] Block limits 1 - 16777215 bytes.
st 10:0:3:0: [st0] Failed to read 65536 byte block with 512 byte transfer.
st 10:0:3:0: [st0] Failed to read 131072 byte block with 65536 byte transfer.
st 10:0:3:0: [st0] Failed to read 65536 byte block with 10240 byte transfer.
st 10:0:3:0: [st0] Failed to read 94208 byte block with 69632 byte transfer.
st 10:0:3:0: [st0] Failed to read 65536 byte block with 10240 byte transfer.
st 10:0:3:0: [st0] Failed to read 65536 byte block with 512 byte transfer.

Most of these were because I was testing different block sizes with the tar -b option, but none of those had any effect.

Occasionally I'm able to read a few kB of data off the first block on the tape (which tar can extract until the data cuts off), but usually it fails with no data at all read.

I have (apparently) successfully written data to tape, moved the tape to the other drive, seeked to the end of the data and then written more, so there appears to be no difficulty in writing data to the drive, just in reading it back again.

I am using two LTO-3 drives. One is a half height HP Ultrium 920 and the other is a full height HP Ultrium 960. Both of them have this problem. I have tried with two different SCSI cards (an LSI Logic Ultra320 card and an Adaptec Ultra2/SE 40MB/sec card), both of which produce the same errors.

I have tried a cable with an attached terminator (gave me 40MB/sec even on the Ultra320 card), then a two-connector cable which meant I could only connect one drive so I enabled the "term power" jumper on the drive, which got me to Ultra160 (even though the drive and controller are both Ultra320) but none of this changed anything and throughout it all I still got the same errors when trying to read from the drive.

I downgraded from Linux kernel 4.10.13 to 4.4.3 (the previous version on this machine) and the error message changes from "Cannot allocate memory" to "Input/output error" but the problem remains the same.

Any ideas what could cause this error?

EDIT: The 40MB/sec problem was caused because I was using an SE active terminator. Once I replaced this with an LVD terminator the speeds went up to Ultra160. I think I need new cables to hit Ultra320 but this is now double the tape bandwidth (max 80MB/sec) so it's fine with me for the time being. Made no difference with the error messages though.

Best Answer

Ok, I think I've worked this out.

TL;DR

Use dd with a large block size to read from the tape instead:

dd if=/dev/nst0 bs=1M | tar tvf -

Background

When you write to tapes, the data is written in units called blocks. These are like sectors on a hard disk. Where hard disk blocks were fixed at 512-bytes for many years and only recently moved to 4096-byte blocks, tape blocks can be set to any size you like.

The block size you wish to use is set with the setblk subcommand in mt-st:

mt-st -f /dev/nst0 setblk 512    # Use 512-byte blocks
mt-st -f /dev/nst0 setblk 64k    # Use 65536-byte blocks

When you issue a read operation to the drive, it will return data in block-sized chunks. You can't read half a block - the smallest amount of data you can read from a tape is one block, which of course could be any number of actual bytes depending on what the block size is.

This means if the program you are using supplies a 16kB memory buffer, you will be able to read up to 32 blocks at a time from the tape with 512-byte blocks as these fit exactly in the 16kB buffer. However you will not be able to read anything from the tape with 64kB blocks, because you can't fit even one of them into the 16kB buffer, and remember you can't read anything less than one whole block at a time.

Should you attempt to do this, by using a buffer that's too small for one block, the driver (in this case the st SCSI tape driver) will return a memory allocation error code to advise you that your read buffer is too small to hold even a single block.

To further complicate matters, some tape drives (apparently the LTO ones I am using) also support variable-sized blocks. This means the block size is determined by the size of each write operation and each block can be a different size to the last.

This mode is set with a block size of zero:

mt-st -f /dev/nst0 setblk 0    # Use variable-sized blocks

This is also the default option as - presumably, I am guessing here - it wastes less space with an incorrectly configured program. If, for example, you had set 4k blocks but your program only wrote data in units of 512 bytes at a time, there is a risk that each 512-byte chunk of data would take up 4k on the tape.

Cause

If you now put everything together, you will realise that a tape can hypothetically have a 512-byte block followed by a 64kB block. If the program is reading the tape with a 16kB buffer, it will successfully read the first block, but then when it tries to read more, it won't be able to fit the following 64kB block in its buffer so the driver will return an error.

This explains why I was getting Cannot allocate memory errors most of the time, and occasionally I was able to get tar to extract the first few files but then I got the error again. I had not set the block size with mt-st so it had defaulted to variable-sized blocks when the tape was written, and now tar was using too small a buffer to read in some of those blocks.

tar has a couple of options for setting its own internal block sizes, namely --blocking-factor, --read-full-records, and --record-size, however these only work if tar is used to directly read and write to the tape.

Because I wrote to the tape through the mbuffer program to reduce tape shoe-shining, the block size in the tar archive no longer matched the block size on the tape. This meant --blocking-factor had little effect - it would allow the first block on the tape to be read, which includes a header telling tar what the blocking factor is supposed to be, wherein it switches to that and ignores the value given on the command line. This means the second and subsequent blocks can no longer be read!

Solution

The solution is to use another program to read from the tape - one that can have the read buffer size set to a value large enough to hold the biggest block we are likely to see.

dd works for this, and in a pinch this works:

dd if=/dev/nst0 bs=256k | tar tvf -

You may need to increase 256k if your tape has larger blocks on it, but this worked for me. 1M also works fine so it doesn't appear to matter if the value is too large, within reason.

Related Question