I am trying to dump the entire contents of a large table, from the command line using pqsl
, but am running into a problem where memory usage goes up, to the point where the process is killed, before any data is even dumped.
What I don't understand is: why isn't the query returning results immediately, and completing without running out of memory?
Here is an explanation of exactly what I'm attempting:
I have a table, say:
CREATE TABLE big
(
id integer,
rand double precision
)
A large number of rows is inserted (50 million):
insert into big
select generate_series(1, 50000000) AS id, random();
The query plan to select every row looks like (not surprisingly):
$ psql -d big -c "explain select * from big;"
QUERY PLAN
----------------------------------------------------------------
Seq Scan on big (cost=0.00..924326.24 rows=50000124 width=12)
(1 row)
I then attempt to dump the contents to file:
$ psql -d big -c "select * from big;" > big.dump
As I said above, this command is failing before any data is written, seemingly by taking up an ever increasing amount of memory before being killed by the OS (by "OOM killer").
Note: I understand I could use pg_dump
to accomplish something similar, but in reality, my query is more complex than this – specifically, I would like to encode each row as JSON when dumping.
Some configuration details:
- postgresql version = 9.3.4
- work_mem = 1MB
- shared_buffers = 128MB
- effective_cache_size = 128MB
Best Answer
By default, the results are entirely buffered in memory for two reasons:
1) Unless using the
-A
option, output rows are aligned so the output cannot start until psql knows the maximum length of each column, which implies visiting every row (which also takes a significant time in addition to lots of memory).2) Unless specifying a
FETCH_COUNT
, psql uses the synchronousPQexec
function directly on the query, which buffers the entire resultset. But when setting aFETCH_COUNT
, it will use a cursor-based method with successive fetch calls and freeing or reusing the client-side buffer everyFETCH_COUNT
rows.So a big resultset should be fetched by a command like:
With
FETCH_COUNT
reduced if the rows are very large and it still eats too much memory.The
-t
stands for--tuples-only
, which suppresses the output of headers and footers.