PostgreSQL Amazon RDS – Out of Memory Error on Bytea Field > 64MB

amazon-rdspostgresql

I am attempting to dump a modestly sized (less than 3GB) database from Postgres 9.5.4 running on Amazon RDS. The instance is a db.m4.xlarge, which has 4 virtual cores and 16GB RAM. It's using the default parameter group.

My dump fails like this:

$ pg_dump -h xxxxx.rds.amazonaws.com -U admin -d mydb -Fc -f dump-20170919.psql
Password:
pg_dump: Dumping the contents of table "plugindata" failed: PQgetCopyData() failed.
pg_dump: Error message from server: out of memory
pg_dump: The command was: COPY public.plugindata (plugindataid, pluginkey, filename, lastmoddate, data) TO stdout;

Because attempting to SELECT the problem field also causes an error, I was able to narrow it down to a specific field in one record. The data field (type bytea) in this record has a size of 66,459,495 bytes, which is an order of magnitude larger than any other record in the table.

I have tried increasing both work_mem (to 1GB) and maintenance_work_mem (to 3GB) to no avail. I'm not all that familiar with memory tuning for Postgres, so I assume there's something else I need to tweak?

EDIT: Server logs contain the following:

...:admin@mydb:[14270]:LOG: could not send data to client: Connection reset by peer
...:admin@mydb [14270]:ERROR: canceling statement due to user request
...:admin@mydb:[14270]:LOG: could not send data to client: Broken pipe
...:admin@mydb:[14270]:LOG: SSL error: bad write retry
...:admin@mydb:[14270]:FATAL: connection to client lost

I have verified that ssl_renegotiation_limit is set to 0.

Best Answer

PostgreSQL's handling of large binary fields is very memory-inefficient. It's quite normal for it to keep three or more copies of a given field in memory when procesing a row, albeit briefly. One TOASTed Datum, one deTOASTed Datum, one stringified hex-formatted Datum. I wouldn't be surprised if SSL added another copy.

That said, this doesn't really look like a memory issue to me. Lets say somehow PostgreSQL insanely stored TEN copies of the field. It's only 63MB. You'd need 640MB of RAM. That's nothing for your server. So unless RDS's postgres is patched to do things normal PostgreSQL doesn't do, or runs with some weird control groups configuration or something, it's unlikely to be a true memory problem.

I suspect the real issue is something else.

work_mem isn't relevant here, as it's only considered for things like sorts. It's not a memory quota for a backend.

"Out of memory" can be reported in error cases where something fails that should not usually fail, under conditions that are usually caused by not being able to allocate memory. Given that, I'm very suspicious of the "bad write retry" report from OpenSSL.

You've excluded issues with renegotiations, but in an opaque system like RDS it's hard to tell what else it might be.

Can you show more of the logs? That may be informative. Especially if you retry after setting log_error_verbosity = verbose and log_min_messages = debug1 in your configuration.

Also, what's the output of SELECT pg_column_size(x), octet_length(x) FROM mytable WHERE ..., where x is the problem row?

(Edit your question to add more info, then comment here to notify me)

If this were normal unpatched PostgreSQL on a real host I'd get you to attach gdb and capture the error so we could see exactly what's happening. But in this case your best bet will be to contact Amazon's support, since they're the only people who can actually see what's happening on the host.