Postgresql – How to properly configure PostgreSQL RAM usage

database-designmemorypostgresql

Using PostgreSQL, we have several hundred concurrent processes doing regular database access. We have our data spread out through several database servers, each of which houses a connection pool with PgBouncer. The problem is that RAM usage (via top – not sure if this is the best way to determine that) tends to climb to nearly or at 100%, on all servers. I am pretty sure this is bad.

I have tried out several configurations of pgbouncer / postgres, and eventually (after a few minutes of my system running) the RAM usage goes up to this point.

My questions are:

  1. Should I set up my connection pooler on a different server as the database? It seems so many open connections on the same server could be causing this.
  2. In general, what are good guidelines for RAM usage on Postgres? I really don't know how to tell if the server is behaving well / as expected or badly.

Thanks very much! Let me know if more information is needed and I'll try to post ASAP.

Edit: When running vmstat 1 for a while, I get this:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0 34 241048  20124  10208 2636248  340   80  1820  2968 3846 2864 12 14  0 73  1
 0 56 240980  19256  10224 2636860  360  140  2568  6100 5773 4874 13 20  0 66  1
 0 56 240916  19876  10244 2635528  476  188  3760  3740 7347 5898 21 27  0 52  1
 1 32 240896  19512  10260 2635532  448  240  3192  4720 5426 4295 15 21  1 63  1
 1 72 240844  19884  10280 2633332  728  216  3620  4424 7867 6927 16 27  0 56  1
 1 62 240568  19884  10292 2633640  404    0  3780  3176 6546 5376 16 21  0 62  1
 2 42 240348  31044  10296 2623100  460    0  2844  3992 6215 5181 19 20  0 60  1
 0 55 240116  20628  10304 2633396  360    0  3288  4264 6749 5958 15 26  0 59  1
 0 71 239896  19884  10300 2632892  624   56  3804  4780 6682 6191 14 22  0 64  1
 2 59 239660  19016  10304 2633980  380    4  3324  3524 6429 5233 15 22  0 63  1
 3 49 239540  27572  10312 2625460  280   92  2820  2764 5424 4533 13 19  0 66  1
 1 58 239272  19264  10316 2632580  676    0  3844  3512 7558 6122 17 27  0 55  1
 1 51 239116  41816  10312 2609204  316   48  3156  3352 5730 4689 16 22  0 62  1
 2 60 238804  74420  10288 2577096  596    0  3296  3880 7080 5153 17 27  0 55  1
 3 64 237500  93516  10272 2562168  372    0  3296  2964 6878 5568 16 30  0 54  1
 0 39 237308 106660  10288 2548092  284    0  1140  3772 4477 3615 10 16  0 74  1
 2 60 237052 119680  10288 2534584  596    0  2956  4688 6976 5747 22 27  0 51  1
 1 70 236772 118440  10284 2530824  532    0  2956  3744 7991 6406 20 34  0 45  1
 0 76 236464 123136  10292 2525108  564    0  2940  4564 6940 6000 16 28  0 56  1
 1 67 236212 139520  10296 2508848  328    0  2896  2660 5471 4493 15 21  0 64  1
 0 36 235960 151796  10244 2495876  584   48  2392  3824 5744 4380 13 24  0 61  1
 1 65 235700 141264  10240 2506308  432    0  2564  5412 6870 6069 16 25  4 55  1
 0 51 235352 135560  10240 2512036  324    0  2836  3620 7448 6674 20 25  0 54  1
 2 71 235180 130352  10228 2517284  292    0  2236  5276 6477 5511 14 24  0 62  1
 3 74 234880 127624  10228 2519252  528    0  2704  4268 7159 6017 15 26  0 59  1
 1 49 234668 126632  10216 2519648  400    0  2440  3860 4797 4144 13 16  0 70  1
 2 51 234436 147588  10204 2499160  296    0  1896  3664 5565 4449 15 20  0 65  0
 1 56 234244 142132  10196 2504652  244    0  2320  4628 5162 4641 13 20  0 66  1
 2 64 234068 138908  10200 2507900  352    0  2276  3960 6161 5640 14 20  0 65  1
 2 69 233844 134336  10200 2512368  332    0  1872  3792 5404 4720 12 18  0 70  1
 0 51 233564 130492  10200 2516328  236    0  2068  3860 4780 4345 10 18  0 71  1
 2 66 233328 124912  10204 2521172  364    0  2480  4532 6694 6401 13 23  0 63  1
 1 53 233068 118340  10212 2526236  252    0  2652  2972 6548 5481 18 28  0 54  1
 1 66 232852 112388  10216 2531388  352    0  2592  4656 7537 6520 25 26  0 49  1
 1 27 232628 126772  10220 2517960  252    0  2332  3344 3670 2787  9 11  0 78  1
 3 74 232376 119208  10236 2524356  500    0  3024  5568 9162 7131 21 33  0 45  1
 0 29 232192 113876  10236 2529992  280    0  2180  5976 4703 3668 12 14  0 73  1
 2 81 231912 108296  10244 2535604  496    0  2864  4532 7717 6603 19 24  0 57  1
 1 55 230600 107304  10260 2540572  400    0  2780  3908 6542 5568 15 26  0 58  1
 2 54 230356 101972  10268 2545692  220    0  2224  6044 5942 5082 15 23  0 62  1
 2 47 230044 113504  10272 2534088  180    0  2620  5820 7214 6138 16 23  0 61  1
 0 24 229872 110280  10272 2537612  152    0  2000  5312 3610 2813  8 12  0 79  1
 3 30 229816 106684  10276 2541068   24    0  1116  5880 4148 2923 12 15  0 73  1
 0 67 229536 100484  10276 2546612  260    0  3084  3656 7847 7088 20 28  0 51  1
 0 50 229240  95152  10280 2552164  388    0  2648  4432 6670 5479 17 22  0 61  1
 2 52 229060  89944  10288 2557044  268    0  2388  3780 5916 5029 22 22  0 55  1
 2 55 228828  85604  10292 2561036  252    0  2032  3152 6003 5416 16 20  0 64  1
 1 67 228632  81016  10292 2565560  164    0  2472  3896 5842 5113 15 21  0 64  1
 0 29 228420  77792  10296 2568708  264    0  2588  3496 4775 3954 10 18  0 71  1
 0 13 228340  77048  10296 2569728   60    0  1000 15576 1620  676  3  4  0 93  1
 3 31 228276  75064  10300 2570736   44    0   928  7840 3764 2959 12 12  0 75  1
 4 56 227976  72212  10300 2573832  368    0  2976  3880 6327 5287 20 23  0 56  1
 2 52 227700  69236  10300 2576308  256    0  2336  4128 7586 5988 17 27  0 55  1
 0 38 227452  66508  10304 2579332  240    0  2916  3400 5230 4456 14 18  0 68  1

Best Answer

From the vmstat output, the cache column shows that about 2.5Gb is used for the disk cache, so the system is not memory starved.

What stands out is the high number of processes in uninterruptible sleep (b column) and the high percentages in the wa column. It means that a lot of processes are constantly doing I/O operations. The load of the machine (shown by top or uptime) is probably always quite high. Note the 0% of idle time for the CPU (id column) and the fair percentage of time spent in the kernel (sy column). The CPUs seem fully busy with the I/O.

On the other hand, the bi and bo columns show that there is a steady but low volume disk activity. Reads are in bi, measured in kernel blocks/s (=1024 bytes) and writes are in bo. Compare this to the raw throughput capacity of your disk subsystem, which would be at least 100Mb/s for a low-end server, about 10 times more than what is shown in this vmstat output.

It looks as if there is a lot of I/O going on but it's mostly hitting the disk cache so that the disk bandwidth is good enough. Or maybe that time waiting on I/O is due to something else like a slow network, although that would be peculiar. Still you could check if your network interfaces are maxed out with a tool like iftop or similar.