Using PostgreSQL, we have several hundred concurrent processes doing regular database access. We have our data spread out through several database servers, each of which houses a connection pool with PgBouncer. The problem is that RAM usage (via top
– not sure if this is the best way to determine that) tends to climb to nearly or at 100%, on all servers. I am pretty sure this is bad.
I have tried out several configurations of pgbouncer / postgres, and eventually (after a few minutes of my system running) the RAM usage goes up to this point.
My questions are:
- Should I set up my connection pooler on a different server as the database? It seems so many open connections on the same server could be causing this.
- In general, what are good guidelines for RAM usage on Postgres? I really don't know how to tell if the server is behaving well / as expected or badly.
Thanks very much! Let me know if more information is needed and I'll try to post ASAP.
Edit: When running vmstat 1 for a while, I get this:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 34 241048 20124 10208 2636248 340 80 1820 2968 3846 2864 12 14 0 73 1 0 56 240980 19256 10224 2636860 360 140 2568 6100 5773 4874 13 20 0 66 1 0 56 240916 19876 10244 2635528 476 188 3760 3740 7347 5898 21 27 0 52 1 1 32 240896 19512 10260 2635532 448 240 3192 4720 5426 4295 15 21 1 63 1 1 72 240844 19884 10280 2633332 728 216 3620 4424 7867 6927 16 27 0 56 1 1 62 240568 19884 10292 2633640 404 0 3780 3176 6546 5376 16 21 0 62 1 2 42 240348 31044 10296 2623100 460 0 2844 3992 6215 5181 19 20 0 60 1 0 55 240116 20628 10304 2633396 360 0 3288 4264 6749 5958 15 26 0 59 1 0 71 239896 19884 10300 2632892 624 56 3804 4780 6682 6191 14 22 0 64 1 2 59 239660 19016 10304 2633980 380 4 3324 3524 6429 5233 15 22 0 63 1 3 49 239540 27572 10312 2625460 280 92 2820 2764 5424 4533 13 19 0 66 1 1 58 239272 19264 10316 2632580 676 0 3844 3512 7558 6122 17 27 0 55 1 1 51 239116 41816 10312 2609204 316 48 3156 3352 5730 4689 16 22 0 62 1 2 60 238804 74420 10288 2577096 596 0 3296 3880 7080 5153 17 27 0 55 1 3 64 237500 93516 10272 2562168 372 0 3296 2964 6878 5568 16 30 0 54 1 0 39 237308 106660 10288 2548092 284 0 1140 3772 4477 3615 10 16 0 74 1 2 60 237052 119680 10288 2534584 596 0 2956 4688 6976 5747 22 27 0 51 1 1 70 236772 118440 10284 2530824 532 0 2956 3744 7991 6406 20 34 0 45 1 0 76 236464 123136 10292 2525108 564 0 2940 4564 6940 6000 16 28 0 56 1 1 67 236212 139520 10296 2508848 328 0 2896 2660 5471 4493 15 21 0 64 1 0 36 235960 151796 10244 2495876 584 48 2392 3824 5744 4380 13 24 0 61 1 1 65 235700 141264 10240 2506308 432 0 2564 5412 6870 6069 16 25 4 55 1 0 51 235352 135560 10240 2512036 324 0 2836 3620 7448 6674 20 25 0 54 1 2 71 235180 130352 10228 2517284 292 0 2236 5276 6477 5511 14 24 0 62 1 3 74 234880 127624 10228 2519252 528 0 2704 4268 7159 6017 15 26 0 59 1 1 49 234668 126632 10216 2519648 400 0 2440 3860 4797 4144 13 16 0 70 1 2 51 234436 147588 10204 2499160 296 0 1896 3664 5565 4449 15 20 0 65 0 1 56 234244 142132 10196 2504652 244 0 2320 4628 5162 4641 13 20 0 66 1 2 64 234068 138908 10200 2507900 352 0 2276 3960 6161 5640 14 20 0 65 1 2 69 233844 134336 10200 2512368 332 0 1872 3792 5404 4720 12 18 0 70 1 0 51 233564 130492 10200 2516328 236 0 2068 3860 4780 4345 10 18 0 71 1 2 66 233328 124912 10204 2521172 364 0 2480 4532 6694 6401 13 23 0 63 1 1 53 233068 118340 10212 2526236 252 0 2652 2972 6548 5481 18 28 0 54 1 1 66 232852 112388 10216 2531388 352 0 2592 4656 7537 6520 25 26 0 49 1 1 27 232628 126772 10220 2517960 252 0 2332 3344 3670 2787 9 11 0 78 1 3 74 232376 119208 10236 2524356 500 0 3024 5568 9162 7131 21 33 0 45 1 0 29 232192 113876 10236 2529992 280 0 2180 5976 4703 3668 12 14 0 73 1 2 81 231912 108296 10244 2535604 496 0 2864 4532 7717 6603 19 24 0 57 1 1 55 230600 107304 10260 2540572 400 0 2780 3908 6542 5568 15 26 0 58 1 2 54 230356 101972 10268 2545692 220 0 2224 6044 5942 5082 15 23 0 62 1 2 47 230044 113504 10272 2534088 180 0 2620 5820 7214 6138 16 23 0 61 1 0 24 229872 110280 10272 2537612 152 0 2000 5312 3610 2813 8 12 0 79 1 3 30 229816 106684 10276 2541068 24 0 1116 5880 4148 2923 12 15 0 73 1 0 67 229536 100484 10276 2546612 260 0 3084 3656 7847 7088 20 28 0 51 1 0 50 229240 95152 10280 2552164 388 0 2648 4432 6670 5479 17 22 0 61 1 2 52 229060 89944 10288 2557044 268 0 2388 3780 5916 5029 22 22 0 55 1 2 55 228828 85604 10292 2561036 252 0 2032 3152 6003 5416 16 20 0 64 1 1 67 228632 81016 10292 2565560 164 0 2472 3896 5842 5113 15 21 0 64 1 0 29 228420 77792 10296 2568708 264 0 2588 3496 4775 3954 10 18 0 71 1 0 13 228340 77048 10296 2569728 60 0 1000 15576 1620 676 3 4 0 93 1 3 31 228276 75064 10300 2570736 44 0 928 7840 3764 2959 12 12 0 75 1 4 56 227976 72212 10300 2573832 368 0 2976 3880 6327 5287 20 23 0 56 1 2 52 227700 69236 10300 2576308 256 0 2336 4128 7586 5988 17 27 0 55 1 0 38 227452 66508 10304 2579332 240 0 2916 3400 5230 4456 14 18 0 68 1
Best Answer
From the vmstat output, the
cache
column shows that about 2.5Gb is used for the disk cache, so the system is not memory starved.What stands out is the high number of processes in uninterruptible sleep (
b
column) and the high percentages in thewa
column. It means that a lot of processes are constantly doing I/O operations. The load of the machine (shown bytop
oruptime
) is probably always quite high. Note the 0% of idle time for the CPU (id
column) and the fair percentage of time spent in the kernel (sy
column). The CPUs seem fully busy with the I/O.On the other hand, the
bi
andbo
columns show that there is a steady but low volume disk activity. Reads are inbi
, measured in kernel blocks/s (=1024 bytes) and writes are inbo
. Compare this to the raw throughput capacity of your disk subsystem, which would be at least 100Mb/s for a low-end server, about 10 times more than what is shown in this vmstat output.It looks as if there is a lot of I/O going on but it's mostly hitting the disk cache so that the disk bandwidth is good enough. Or maybe that time waiting on I/O is due to something else like a slow network, although that would be peculiar. Still you could check if your network interfaces are maxed out with a tool like
iftop
or similar.